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For  analyzing  informatively  censored  failure  time  data,  a method  for  using  pro- 
portional hazards  models  in  a competing  risks  setup  is  developed  In  this  doctoral 
research.  The  proposed  models  represent  cases  in  which  the  times  to  a primary  event 
may  be  informatively  or  randomly  censored  and  the  times  to  a followup  event  may 
be  randomly  censored.  The  appropriate  partial  likelihood  is  expressed  using  counting 
process  representation.  Such  a representation  provides  a convenient  structure  to  es- 
tablish the  uniqueness  of  Maximum  Partial  Likelihood  Estimators  (MPLEs)  and  the 
limiting  normal  distribution  of  MPLEs.  Simulation  studies  are  used  to  examine  (i) 
the  effect  on  MPLEs  of  the  expected  censoring  proportions,  (ii)  the  appropriateness 
of  normal  approximation  to  the  sampling  distribution  of  the  MPLEs  for  small  and 
large  sample  sizes,  and  (iii)  the  performance  of  MPLEs  when  informative  censoring  is 
incorrectly  treated  as  random  censoring.  Usefulness  of  martingale  and  score  residual 
plots  for  detecting  model  misspecificatlon  is  demonstrated.  The  proposed  model  is 
applied  to  a real  data  set  from  a study  of  leukemia  patients,  and  the  results  and  the 
residual  plots  are  discussed  and  interpreted. 


CHAPTER  1 

INTRODUCTION  AND  LITERATURE  REVIEW 
1.1  Introduction 


When  analyzing  data  that  are  observed  times  to  an  event,  one  is  often  faced  with 
the  problem  of  dealing  with  right  censored  observations.  A right  censored  observation 
is  an  observed  time  that  is  known  to  be  less  than  the  time  of  the  occurrence  of  the 
event.  In  what  follows,  right  censored  observations  will  be  simply  referred  to  as 
censored  observations.  Censored  observations  are  common  in  longitudinal  studies 
and  clinical  trials,  where  subjects  may  be  lost  to  followup  for  various  reasons.  For 
example,  in  a study  to  evaluate  the  waiting  time  of  leukimia  patients  to  bone  marrow 
transplantation  (BMT),  the  time  to  occurrence  of  BMT  may  be  censored  by  relapse, 
death,  or  relocation  of  the  patient. 

In  the  following  work,  the  event  of  interest  will  be  referred  to  as  a failure.  Let  T 
be  time  to  failure,  and  C be  time  to  censoring.  Assume  that  T and  C are  continuous 
with  densities  /r(i|^i)  and  fc{A^2)i  where  9\  and  62  are  vectors  of  real  parameters. 
Then  the  likelihood  of  an  observed  time  can  be  expressed  as 

i = lfTm)]>[fc{c\e2)]'-‘,  (1.1) 

where  5 equals  one  or  zero  depending  on  whether  the  observed  time  is  uncensored 
(r  < C)  or  censored  (C  < T). 

Censoring  is  said  to  be  informative  with  respect  to  the  parameters  of  the  failure 
time  distribution  if  61  and  62  have  at  least  one  common  element.  That  is,  9i  and 
02  have  the  form  9i  = (/3,  /j,)  and  02  = (7,  where  n can  be  a scaler  or  a vector. 
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Then  (1.1)  can  be  expressed  as 

L(/3,7,/i)  = [fT{t\{0,  n))Y[fc{c\{j,  . 

If  di  and  62  do  not  have  any  elements  in  common,  censoring  is  said  to  be  noninfor- 
mative  and  the  likelihood  can  be  expressed  as 

Noninformative  censoring  is  commonly  referred  to  as  random  censoring,  and  we  will 
use  this  term  hereafter. 

Lagakos  (1979)  noted  that  the  assumption  of  random  censoring  in  clinical  trials 
is  inappropriate  when  the  dropouts  are  related  to  the  health  of  the  patient.  When 
censoring  is  informative,  the  estimator  of  survival  function,  S{t)  = P{T  > t)  which 
neglects  the  censoring  information,  will  be  an  overestimate  (underestimate)  if  the  cen- 
soring carries  an  unfavorable  (favorable)  prognosis  for  the  failure  time.  Nevertheless, 
many  statistical  methods  for  censored  survival  data  are  based  on  the  assumption  that 
censoring  is  random. 

In  this  work,  the  primary  objective  is  to  develop  a method  for  analyzing  failure 
time  data  subject  to  informative  censoring.  As  an  example  of  informatively  censored 
failure  time  data,  we  consider  the  data  from  a Pediatric  Oncology  Group  (POG)  study 
of  leukemia  patients.  The  study  was  conducted  to  compare  BMT  and  Chemother- 
apy as  treatments  for  childhood  acute  myelocytic  leukemia.  Eligible  patients  were 
randomized  to  receive  BMT  or  Chemotherapy  treatment.  We  will  use  the  data  on 
a subset  of  patients  who  were  randomized  to  BMT  to  assess  the  effects  of  covariates 
such  as  age,  gender,  race,  etc.  on  the  waiting  time  for  BMT.  Subjects  who  are  ran- 
domized to  BMT  may  experience  relapse  of  leukimia  or  death  before  receiving  BMT, 
where  some  subjects  may  still  be  followed  after  receiving  BMT  for  secondary  or  sur- 
vival information.  When  a subject  experiences  relapse,  the  subject  is  removed  from 
the  study  and  receives  a different  treatment.  Thus,  both  relapse  and  death  can  be 
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considered  as  censoring  events.  Particularly,  since  the  subjects  who  experience  relapse 
or  death  before  BMT  are  too  sick  to  receive  BMT,  it  is  reasonable  to  consider  that 
waiting  time  to  BMT  is  informatively  censored  by  relapse  or  death.  Consequently, 
analysis  arising  from  this  kind  of  study,  where  the  major  failure  variable  is  subject 
to  informative  as  well  as  random  censorings  poses  a challenge  of  how  to  incorporate 
such  informative  censoring. 

In  order  to  include  informative  censoring  when  making  inferences  about  survival 
functions,  one  can  assume  a joint  distribution  of  the  times  to  failure  and  informative 
censoring.  Alternatively,  one  can  treat  informative  censoring  as  a type  of  failure  that 
provides  information  about  the  failure  time  of  interest.  Then  the  observed  outcome 
data  are  regarded  as  times  to  failure  due  to  multiple  causes,  which  can  then  be 
analyzed  using  methods  based  on  competing  risks  models. 

1.2  Competing  risks 


Suppose  there  are  k mutually  exclusive  and  exhaustive  causes  of  failure,  and  let 
the  continuous  random  variable  T^,  v = 1, . . . ,k  denote  the  failure  time  due  to  cause 
V.  Then  % is  called  the  net  survival  time  corresponding  to  the  cause  v.  If  V denotes 
the  cause  of  failure,  then  one  actually  observes  X = min{Ti  ...Tk},  and  X = 
if  and  only  if  V = v.  The  pair  {X,  V)  will  be  called  the  identified  minimum  of  the 
failure  times.  Defining  the  net  hazard  function  of  as 


K{t) 


lim 

At->0+ 


P{t<Ty  <t  + Xt\%>  t) 
Af 


H,{t)  = P{T,  > t) 


hv{u)du). 


one  can  show  that 
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The  function  Hy{.)  is  called  a net  survival  function.  The  cause-specific  or  crude 
hazard  function  in  the  presence  of  k risks  is  defined  as 


K{t) 


lim 

At^0+ 


P{t  <X  <t  + At,  V 
At 


V \ X > t) 


..  P(t<Ty  <t  + At\Ti>t,l  = l...  k) 
hm  — ^ ^ 
A(->0+  At 


Since  causes  of  failure  are  mutually  exclusive  and  exhaustive,  the  overall  hazard  rate 
is 


k 

Kt)  = 

v=\ 


P(t  < X < t + At  \ X > t) 

hm  — ^ ! ^ 

At->0+  At 


One  can  obtain  a crude  survival  probability, 


Sy{t)  = P{X  > t,V  = V)  = P{Ty  > t,  > Ty))  . 

This  competing  risks  setup  is  related  to  the  situation  of  informative  censoring  if 
one  considers  two  failure  types,  V = 1 if  failure  and  F = 2 if  censored,  and  the 
distribution  of  T2  depends  on  the  parameter(s)  of  the  distribution  of  failure  time 
(Ti).  This  formulation  will  play  an  important  role  later  in  hazard  modeling  when 
we  account  for  informative  censoring  as  a type  of  failure  which  provides  information 
about  failure  time  parameter (s). 

In  typical  cases,  the  goal  is  to  determine  the  net  survival  function  of  failure  from 
the  observable  crude  survival  function.  That  is,  one  is  interested  in  the  probability 
that  the  subject  survives  up  to  time  t in  which  a particular  failure  is  the  only  possible 
risk.  Unfortunately,  without  the  assumption  of  independence  of  T„’s,  the  set  of  crude 
survival  functions  Sy{t)  is  consistent  with  an  infinity  of  joint  distributions  of  failure 
times  as  pointed  out  by  Cox  (1959)  in  the  bivariate  case  and  Tsiatis  (1975)  in  com- 
peting risks  setups.  However,  Hy{t)  can  be  identified  by  the  crude  survival  function 
provided  when  the  Ty’s  are  mutually  independent. 
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Therefore,  for  a given  set  of  crude  survival  functions,  one  cannot  identify  the  net 
survival  functions  without  the  assumption  of  independence  of  Ty’s  (Tsiatis,  1975). 
Due  to  these  unidentifiability  aspects  of  competing  risks  setup,  most  of  the  work  in 
this  area  has  been  done  in  a parametric  setup  wherein  one  specifies  a joint  distribution 
of  failure  and  censoring  times. 


1.3  Literature  Review 

1.3.1  Parametric  Approach 

A number  of  authors  have  proposed  methods  for  analyzing  multivariate  failure 
time  data  on  the  basis  of  an  assumed  parametric  model  for  the  joint  distribution  of 
Ti . . . Tfc.  For  the  case  k = 2,  Nadas  (1971)  proved  that  a bivariate  normal  distribution 
is  identifiable.  Basu  and  Ghosh  (1978)  have  extended  Nadas  work  and  showed  that 
trivariate  normal  distribution  is  partially  identifiable.  Basu  and  Ghosh  (1978,  1980) 
have  described  identifiability  of  bivariate  exponential  families  of  distributions  such 
as  those  of  Marshall  and  Olkin  (1967),  Block  and  Basu  (1974),  Gumbel  (1960),  and 
Freund  (1961).  In  particular,  Marshall-Olkin  (1967)  and  Gumbel  (1960)  models  are 
shown  to  be  identifiable  given  the  distribution  of  the  identified  minimum  (X,  V),  while 
the  other  models  are  shown  to  be  unidentifiable.  Moeschberger  (1974)  and  Emoto  and 
Matthews  (1990)  described  a bivariate  Weibull  distribution  arising  from  the  bivariate 
exponential  distribution  for  bivariate  failure  time  data.  The  latter  authors  showed 
that  the  bivariate  Weibull  distribution  is  identifiable  given  the  identified  minimum 
(X,G). 

Smith  and  Helms  (1995)  developed  a computationally  feasible  approach  to  esti- 
mate the  parameters  of  a multivariate  normal  distribution  of  the  responses  (failure 
times  and  censoring  times)  by  maximizing  the  likelihood  using  the  EM  algorithm 
technique. 
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Hoover  and  Guess  (1990)  introduced  dependence  between  failure  time  and  censor- 
ing time  using  the  notion  of  response  linked  (RL)  censoring.  Such  censoring  occurs 
when  censoring  time  is  highly  correlated  with  failure  time.  As  an  example  of  RL 
censoring,  consider  a study  to  evaluate  a program  to  quit  smoking.  The  failure  time 
is  a relapse  to  smoking,  and  a subject  may  be  censored  because  he  or  she  stops  going 
to  the  clinic  because  of  the  relapse.  Hoover  and  Guess  (1990)  developed  a method 
of  estimation  by  maximizing  the  likelihood,  and  the  presence  of  RL  censoring  can 
be  detected  by  the  likelihood  ratio  test.  These  authors  used  exponential  models  for 
illustrating  their  method. 

Baker,  Wax,  and  Patterson  (1993)  treated  failure  time  and  censoring  time  as 
discrete  variables  by  dividing  time  into  intervals,  and  assumed  a multinomial  model 
for  the  joint  distribution  of  failure  and  censoring  times.  They  proposed  a coupled 
logistic  regression  model  which  consists  of  one  regression  for  the  hazard  of  failure 
given  covariates  and  another  for  the  hazard  of  censoring  given  covariates  and  time  of 
failure.  These  authors  used  a model  similar  to  the  response-linked  censoring  model 
of  Hoover  and  Guess  (1990)  in  that  informative  censoring  is  included  by  a term  that 
accounts  for  contribution  to  the  model  when  failure  occurs  in  the  interval  of  censoring. 

There  exists  a substantial  amount  of  literature  on  the  use  of  parametric  models  to 
account  for  informative  censoring  in  repeated  measurements  setup.  In  these  works, 
the  focus  is  not  on  failure  time.  Rather,  the  observed  responses  are  serial  measure- 
ments of  a response  variable  taken  on  a subject  at  different  time  points.  Among 
these  are  the  papers  by  Wu  and  Garroll  (1988),  Schluchter(1992),  Mori,  Woolson, 
and  Woodworth  (1994),  and  Gruttola  and  Tu  (1994).  The  main  focus  in  these  works 
is  on  estimating  and  comparing  rates  of  change  of  the  response  variable.  Various 
linear  random  effects  models  have  been  proposed  in  the  context  of  informative  cen- 
soring, where  censoring  provides  information  about  the  underlying  rate  of  change  for 
the  response  variable. 
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The  notion  of  informative  censoring  can  be  used  for  analysis  of  missing  or  in- 
complete data.  If  the  variable  of  interest  is  not  a failure  time,  it  is  reasonable  to 
consider  censored  data  as  incomplete  or  missing  data.  One  could  relate  incomplete 
data  with  nonresponses  to  informative  censoring,  where  nonresponses  are  nonignor- 
able.  As  noted  in  Little  (1982),  “The  response  mechanism  is  said  to  be  nonignorable 
if  it  depends  on  a subject’s  unobserved  response.”  This  is  related  to  the  notion  of 
informative  censoring  because  nonresponses  provide  information  about  the  response 
variable.  Various  models  of  nonignorable  nonresponses  have  been  explored  for  survey 
and  observational  studies  by  Little  (1982),  Laird  and  Ware  (1982),  Nordheim  (1984), 
Stasny  (1985),  Baker  and  Laird  (1988)  among  others.  Heyting,  Tolboom,  and  Essers 
(1992)  reviewed  and  compared  different  methods  of  overcoming  the  problem  of  miss- 
ing data  with  the  assumption  that  dropouts  depend  only  on  the  observed  responses 
in  a clinical  setting. 

Williams  and  Lagakos  (1977)  examined  a general  situation,  where  the  cause  of 
censoring  may  depend  on  failure,  but  the  dependence  does  not  influence  inferences 
about  the  survival  function  of  failure  time.  Such  a condition  is  referred  by  these 
authors  to  as  a constant-sum  condition.  However,  due  to  unidentifiability  problems 
the  constant-sum  condition  is  not  testable  although  it  can  be  partially  tested  under 
the  exponential  model.  Kalbfleisch  and  MacKay  (1979)  pointed  out  that  constant- 
sum  condition  is  equivalent  to  a simple  relationship  between  the  hazard  functions  for 
failure  and  censoring  times. 

In  a setting  similar  to  the  one  used  by  Williams  and  Lagakos  (1977),  Lagakos 
and  Williams  (1978)  proposed  a class  of  variable-sum  models  in  terms  of  the  odds  of 
observing  a failure  and  a scale  parameter  9 which  measures  the  influence  of  censoring. 
In  this  model,  9 = 1 corresponds  to  random  censoring,  and  the  resulting  model  is  the 
constant-sum  model.  When  0 < 0 < 1,  censoring  is  informative,  and  the  resulting 
model  is  the  variable-sum  model.  Parametric  assumptions,  such  as  the  assumption 
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of  exponentially  distributed  failure  times  are  needed  to  estimate  9,  survival  function, 
and  the  odds  function. 

Parametric  assumptions  about  survival  and  censoring  distributions  are  often  hard 
to  justify  in  practice.  A review  of  the  relevant  literature  on  nonparametric  and  semi 
parametric  models  to  incorporate  informative  censoring  will  be  presented  next. 

1.3.2  Nonparametric  &:  Semiparametric  Approaches 

Several  nonparametric  and  semiparametric  methods  exist  for  analyzing  data  that 
are  repeated  measurements  of  an  outcome  variable  taken  at  different  followup  times. 
Some  recent  papers  in  this  area  are  by  Dawson  and  Lagakos  (1993)  and  Rotnitzky  and 
Robins  (1995).  In  these  papers,  missing  observations  are  regarded  as  informatively 
censored  observations  because  the  cause  for  their  missing  may  be  related  to  the  value 
of  the  outcome  variable.  However,  the  primary  objective  in  these  papers  was  to 
compare  responses  to  treatments  on  the  basis  of  repeated  measurements  not  on  the 
basis  of  times  to  an  event. 

When  covariates  are  not  used,  and  censoring  is  random,  the  Product-Limit  (PL) 
estimator  proposed  by  Kaplan  and  Meier  (1958)  is  a widely  used  nonparametric  esti- 
mator of  survival  function  S{t).  Let  t\  <t2  < ..  < be  the  N observed  times.  Then 
the  PL  estimate  of  S{t)  is  defined  as 

•S'(f)  = + 1)1’ 

r 

where  the  product  is  taken  over  all  integer  values  of  r such  that  U < t,  and  tr  is 
an  uncensored  observed  failure  time.  However,  as  noted  by  Lagakos  (1979),  in  the 
presence  of  informative  censoring,  the  PL-estimator  will  generally  overestimate  or 
underestimate  the  true  survival  function. 

In  an  attempt  to  account  for  informative  censoring,  Fisher  and  Kanarek  (1974) 
introduced  a survival  model  in  which  the  effect  of  informative  censoring  is  determined 
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by  a scale  parameter  a that  stretches  or  contracts  the  survival  time.  If  T and  U 
are  survival  and  informative  censoring  time,  Fisher-Kanarek  model  is  based  on  the 
assumption: 

P{T  > t\U  = u < t)  = P{T  > u + a{t  — u)\U  > u + a{t  — u)),  (1.2) 

where  a > 0 is  a parameter  that  measures  the  amount  of  information  about  T pro- 
vided by  U.  According  to  (1.2),  the  probability  of  surviving  up  to  time  t for  a subject 
who  was  censored  at  time  u {<  t)  is  the  same  as  the  probability  of  surviving  up  to 
u + a{t  — u)  for  a subject  who  was  uncensored  up  to  that  time.  Note  that  a — 1 im- 
plies that  U is  random,  and  large  (small)  a corresponds  to  poor  (favorable)  prognosis 
for  subsequent  survival.  Peterson  (1976)  derived  bounds  for  survival  function  under 
(1.2).  Slud  and  Rubinstein  (1983)  and  Klein  and  Moeschberger  (1988)  strengthened 
the  Peterson  bounds. 

The  nonparametric  and  semiparametric  methods  described  thus  far  were  devel- 
oped with  the  objective  of  estimating  the  survival  function  when  there  are  no  covari- 
ates. However,  there  usually  are  covariates  upon  which  the  survival  function  may 
depend.  Therefore,  it  is  of  interest  to  develop  methods  of  inferences  about  failure 
time  distributions  that  account  for  concomitant  information  in  the  covariates. 

In  one  approach  to  incorporate  covariate  information,  a covariate  is  considered  as 
an  unknown  frailty,  and  used  to  model  survival  function  and  association  in  bivariate 
distribution.  For  example,  suppose  that  the  time  to  failure  and  time  to  informative 
censoring,  T and  U,  are  independent  conditional  on  an  unknown  covariate  (frailty) 
Z = z,  so  that 

S{t,u\z)  = ST{t\z)Su{u\z) 


S{t,  u) 


I 


ST{t\z)Su{u\z)f{z)dz, 


and 
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where  f{z)  is  the  density  function  of  a frailty  Z.  In  this  context,  frailty  models  have 
been  used  extensively  to  estimate  the  association  in  bivariate  distributions.  Clayton 
(1978)  proposed  a model  in  which  two  events  shared  a common  influence  (frailty), 
but  one  event  does  not  influence  the  other.  In  such  model,  bivariate  failure  rate  can 
be  expressed  in  terms  of  an  association  parameter  6 and  two  marginal  failure  rates. 
Specifically,  l{t,u)  = 9g{t-,u)h{u;t),  where  l{t,u)  is  the  bivariate  failure  rate,  and 
g{t]u)  and  h{u;t)  are  the  marginal  failure  rates.  By  assuming  the  conditional  inde- 
pendence of  T and  U given  the  common  frailty,  the  maximum  likelihood  estimator  of 
9 can  be  obtained  from  the  log  likelihood  derived  from  the  application  of  proportional 
hazards  model  of  Cox  (1972). 

Clayton  and  Cuzick  (1985)  used  a bivariate  log-linear  model  to  describe  the  asso- 
ciation between  T and  U.  If  (T,  U)  is  the  bivariate  survival  and  informative  censoring 
time,  Clayton  and  Cuzick  (1985)  assumed  that 

logT  = PiZ  + 6i 


and 


loglJ  = + 62, 

where  (61,62)  are  independent  random  variables,  and  Z is  an  unobserved  covariate. 
They  showed  that  an  efficient  test  for  association  between  two  time  variables  can 
be  based  on  the  generalized  Savage  scores  (log-rank  scores).  To  obtain  an  efficient 
estimator  of  association,  an  iterative  method  can  be  developed  with  scores  based  on 
the  generalized  Savage  scores. 

Link  (1989)  modifled  the  PL-estimator  by  assuming  that  censoring  occurs  in  a 
high-risk  or  low-risk  subpopulation  defined  by  a frailty  distribution  represented  by 
a random  variable  Z such  that  the  hazard  at  given  time  t with  frailty  Z = z is 
IJ.{t\z)  = zg,{t).  Link  (1989)  suggested  that  the  quantity  nit)  can  be  interpreted 
as  the  age  effect  and  is  independent  of  Z.  Survival  function  conditional  on  Z = z 
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is  given  by  S{t\Z  — z)  = exp{—  zjj,{u)du),  so  that  the  objective  is  to  estimate 
S{t)  = E{S{t\z)).  By  using  a specific  frailty  distribution  such  as  a unit  exponential, 
Link  (1989)  showed  that  a modified  PL-estimator  can  be  obtained  from  the  self- 
consistency  algorithm. 

Xue  and  Brookmeyer  (1996)  extended  the  univariate  fraility  model  to  the  bivariate 
case  by  proposing  a bivariate  lognormal  frailty  model  in  which  two  possibly  associ- 
ated frailities  (Zi,  Z2)  influence  the  bivariate  survival  and  informative  censoring  time 
{T,U).  The  main  advantage  of  using  a bivariate  frailty  model  is  the  allowance  for 
negative  associations  between  T and  U,  which  the  univariate  frailty  model  does  not 
permit.  However,  implementing  the  bivariate  frailty  model  requires  a fairly  intense 
computation. 

An  important  example  where  informative  censoring  occurs  is  a clinical  trial  in 
which  the  study  subjects  are  followed  after  the  first  failure.  Note  that  in  the  BMT 
example  described  in  Section  1.1,  the  subject  is  followed  after  BMT  (first  failure)  for 
additional  information  such  as  relapse  or  death,  and  if  relapse  or  death  occurs  before 
BMT  then  time  to  BMT  is  informatively  censored. 

Under  such  a setting,  Lin,  Robins,  and  Wei  (1996)  proposed  a semiparametric  ap- 
proach to  adjust  for  informative  censoring  when  comparing  failure  time  distributions 
in  two  groups.  Let  T be  time  to  failure,  U be  time  to  informative  censoring,  C be  time 
to  random  censoring,  and  Z be  treatment  indicator.  They  assumed  that  there  exists 
60  and  T]o  such  that  [logT  — 9qZ,  logU  — rjoZ)  satisfies  a bivariate  location-shift  model 
with  a completely  unspecified  joint  distribution.  A consistent  estimator  of  r/o  can  be 
obtained  by  inverting  the  log-rank  statistic.  However,  for  consistently  estimating  9q 
an  artificial  transformation  is  needed.  Also,  the  method  of  Lin  et  al.  (1996)  is  limited 
to  dichotomous  covariates. 

Kalbfleisch  and  Prentice  (1980)  have  described  methods  for  hazard  modeling  at 
subsequent  failure  time  beyond  the  first  failure  in  a competing  risks  setup.  These 
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methods  can  be  adapted  for  using  the  proportional  hazards  model  of  Cox  (1972)  and 
modeling  the  hazards  of  failure  and  informative  censoring.  Several  models  in  a similar 
setting  were  studied  by  Prentice,  Williams,  and  Peterson  (1981),  Kay  (1982),  Klein, 
Keiding,  and  Kamby  (1989),  Demasi  (1994)  among  others. 

Prentice  et  al.  (1981)  modeled  hazard  functions,  where  a subject  faces  multiple 
failures  of  the  same  type.  They  proposed  two  proportional  hazards  functions,  both  of 
which  depend  on  the  number  of  previous  failures  of  the  subject.  Let  n{t),  and 
Z be  the  number,  time  of  failures  just  prior  to  time  t,  and  the  vector  of  covariates, 
respectively.  The  conditional  hazards  at  time  t given  n{t)  and  covariates  Z = z can 
be  expressed  as 

A{t|n(t),z}  = Xos{t)exp{z0,} 


and 


A{t|n(t),  z}  = Aos(t  - tn(t))exp{zf3J, 

where  s is  the  stratification  variable  based  on  the  number  of  failures,  the  Aqs(.)  are  ar- 
bitrary baseline  stratum-specific  hazard  functions,  and  the  are  the  stratum-specific 
regression  parameters.  The  first  model  allows  for  an  arbitrary  baseline  hazard  depen- 
dent on  the  total  time  of  the  study,  while  the  second  model  allows  for  an  arbitrary 
baseline  hazard  that  depends  on  the  time  from  the  subject’s  previous  failure.  Infer- 
ences about  the  regression  parameters  can  be  drawn  from  the  partial  likelihood. 

Kay  (1982)  extended  Cox’s  proportional  hazards  model  to  a multistate  framework 
by  allowing  for  several  transient  disease  states  between  initial  entry  and  death.  The 
regression  parameters  and  the  transient-specific  hazards  can  be  estimated  in  this 
framework. 

Klein  et  al.  (1989)  and  Demasi  (1994)  considered  the  proportional  hazards  models 
for  analyzing  consecutive  events,  where  events  are  dependent.  The  former  authors 
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focused  on  the  marginal  hazards  arising  from  a semiparametric  model  of  Marshall- 
Olkin  (1967)  for  modeling  same  type  of  failure  occurring  at  different  sites.  If  Tj  is 
the  time  to  failure  at  site  j,  j = 1,2,  and  Z is  the  covariate  vector,  Klein  et  ah  (1989) 
assumed  that  the  joint  survival  function  of  {Ti,T2)  given  Z = z is 

S{ti,t2\z)  = exp{-Hi{ti\x)  - H2{t2\x)  - H3{max{tx,t2)\z)},  (1.3) 

where  hj{u\z)du,  j = 1,2,3  are  the  cumulative  hazard  functions,  and 

j = 3 implies  sites  1 and  2 simultaneously.  The  marginal  hazards  for  (1.3)  are 

Xk{t\z)  ==  hk{t\z)  + h3{t\z),  k = 1,2. 

The  hazard  at  time  t occurring  at  site  j'  after  site  j j)  is 

= ho{t)exp{jj){exp{zl3j>  + ay)  + exp{z(3:x)), 

where  ho{t)  is  a common  baseline  hazard,  /3y,  j =1,2  and  /3^  are  vectors  of  site- 
specific  regression  parameters,  a^'  is  a parameter  that  measures  the  likelihood  of 
failure  at  site  j',  and  jj  measures  the  influence  from  the  previous  site  j which  con- 
tributes only  when  a failure  occurs  at  the  current  site  f . Although  only  one  type  of 
failure  is  considered,  the  simultaneous  influence  on  the  two  events  can  be  analyzed 
from  this  model. 

Demasi  (1994)  constructed  a conditional  hazard  model  by  assuming  that  the  first 
type  of  failure  affects  the  second  type.  The  conditional  hazard  of  type  v at  time  t 
given  the  previous  type  and  time,  v and  xi,  v , and  the  covariate  vector  Z = z is 

{t)exp{ayXi  -\-z(3)I{t  > xi), 

where  (t)  is  the  arbitrary  baseline  hazard,  f3  is  the  vector  of  regression  parameters, 
z is  the  covariate  vector,  and  can  be  interpreted  as  a dependence  parameter.  Here, 
he  excluded  the  simultaneous  influence  of  the  two  events. 
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These  are  models  for  times  to  two  types  of  event  subject  to  random  censoring. 
In  this  dissertation,  we  are  interested  in  adapting  these  models  for  describing  failure 
times  that  are  subject  to  both  random  and  informative  censorings. 

1.3.3  Concluding  Remarks 


As  we  have  seen  in  Section  1.3.2,  little  work  has  been  done  with  semiparametric 
models  for  multivariate  failure  times  in  the  presence  of  informative  censoring.  Al- 
though one  can  try  a parametric  approach,  it  is  often  not  easy  to  verify  that  the  ob- 
served failure  times  satisfy  a specific  parametric  distribution.  Consequently,  we  focus 
on  semiparametric  models  for  analysis  of  informatively  censored  data.  We  consider 
the  use  of  the  proportional  hazards  model  of  Cox  (1972)  because  it  accommodates  a 
flexible  number  of  covariates  with  easily  interpretable  parameters.  We  will  develop 
a method  to  use  proportional  hazards  models  to  account  for  informative  censoring 
which  provides  information  about  major  failure  time  parameter(s)  in  a competing 
risks  setup. 


1.4  Overview  of  Research 


In  Chapter  2,  modeling  of  the  proportional  hazards  function  is  developed,  and  the 
appropriate  partial  likelihood  is  constructed.  Then  the  associated  partial  likelihood  is 
written  in  counting  process  representation  which  is  expressed  in  a convenient  structure 
to  examine  the  asymptotic  properties  of  maximum  partial  likelihood  estimators  and 
residuals.  The  regression  parameters  are  estimated,  and  in  this  estimating  process, 
the  concavity  of  log  partial  likelihood  is  proven  to  guarantee  the  unique  maximization. 
Asymptotic  properties  of  maximum  partial  likelihood  estimators  are  derived  using  the 
counting  process  representation.  In  Chapter  3,  simulation  studies  are  presented  which 
examine  characteristics  of  the  maximum  partial  likelihood  estimators.  The  effect  of 
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expected  censoring  rates  on  the  estimators,  a normal  approximation  to  the  sampling 
distributions  of  the  estimators,  and  performance  of  estimators  when  censoring  is  as- 
sumed random  when,  in  fact,  it  is  informative  are  considered.  In  Chapter  4,  residual 
analysis  using  martingale  and  score  residuals  is  studied  by  simulation  of  residuals.  In 
Chapter  5,  the  real  data  set  from  a Pediatric  Oncology  Group  study  is  applied,  and 
in  Chapter  6,  ideas  for  further  research  are  discussed. 


CHAPTER  2 

PROPORTIONAL  HAZARDS  MODEL  WITH  INFORMATIVE  CENSORING 

2.1  Introduction 


In  this  chapter,  we  examine  the  use  of  proportional  hazards  models  for  modeling 
informatively  censored  failure  time  data.  The  proportional  hazards  model  has  been 
used  extensively  since  Cox  (1972)  introduced  it  in  the  context  of  univariate  failure 
time  data.  Extension  of  the  proportional  hazards  model  to  bivariate  failure  time 
data  with  two  failure  types  will  be  introduced  and  adapted  to  model  informatively 
censored  failure  times. 

The  appropriate  likelihood  and  partial  likelihood  are  derived  in  Section  2.2.  A 
counting  process  representation  of  the  informatively  censored  failure  times  will  be 
introduced  in  Section  2.3.1,  and  the  associated  partial  likelihood  will  be  expressed 
in  terms  of  counting  processes  and  multiplicative  intensity  processes.  Estimation  of 
regression  parameters  in  the  underlying  multiplicative  intensity  processes  will  follow 
in  Section  2.3.2.  Finally,  the  asymptotic  properties  of  maximum  partial  likelihood 
estimators  will  be  derived  in  Section  2.3.3. 
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2.2  Proportional  Hazards  Modeling 
2.2.1  Bivariate  Failure  Time  Data  with  Competing  Risks 

Proportional  Hazards  Model 


In  this  section,  we  follow  Kalbfleisch  and  Prentice  (1980)  to  describe  how  infor- 
mative censoring  and  proportional  hazards  models  can  be  handled  in  the  setting  of 
competing  risks.  Suppose  that  each  subject  will  experience  either  or  both  of  two  types 
of  mutually  exclusive  events.  Let  be  the  time  at  which  the  subject  experiences 
event  type  w (w  = 1,  2),  so  that  X = min{T\,T2}  is  the  time  of  first  event.  We  assume 
that  (Ti,T2)  is  continuous.  If  is  the  type  of  the  first  event,  then  if  and 

only  if  X = Tyi . Let  Z denote  the  vector  of  covariates.  For  simplicity,  assume  that  Z 
does  not  depend  on  time.  Then  conditional  on  Z = z,  the  crude  hazard  of  first  event 
at  time  t is  defined  as 


A„i(t|z) 


lim 

At->-0+ 


P{t  < X < t + At, 
At 


X > t,z) 


= 1,2. 


The  corresponding  overall  hazard  and  survival  functions  are 


A(i|z)  = ^ A„i(t|z), 


1)1 =1 


and 


5(t|z)  = exp{—  / \{u\z)du) 

Jo 

, respectively. 

Next,  let  Y and  denote  time  and  type  of  the  second  event.  The  conditional 
hazard  at  time  t of  given  = v'^ , X = x\  and  Z = z is 
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P{t<Y  <t  + At, = v^  \ Y > t,  =v\X  = xi,  z) 
At->0+  At 

The  corresponding  overall  conditional  hazard  and  survival  functions  are 

2 

A(t|xj,z)  ^ ^ (t  jxj , z) 


and 


5(t|xi,  z)  = ea:p(—  / X{u\xi,z)du) 

J Xi 

, respectively.  Then  the  joint  hazard  of  first  event  at  Xi  and  second  event  at  t 
given  Z = z is 

(^1  ? ^ 1^)  Ayl  (xj  |z  j A|j2  jijl  jjy  1 , Z j . 

Following  Demasi  (1994),  we  assume  the  proportional  hazards  model 

A„i(t|z)  = Xoyi{t)exp{z(3) 


and 


A„2|i,i(t|xi,z)  = Xoy2{t)exp{zf3)h{xi)l {t  > Xi),  (2.1) 

where  Aq,,;  is  the  unspecified  baseline  hazard  function  (i  = 1,  2),  /3  is  a vector  of 
regression  parameters,  and  h{xi)  is  an  appropriately  specified  function  that  accounts 
for  dependence  between  the  first  and  second  failure  times.  Note  that  the  conditional 
hazard  (2.1)  is  identically  zero  when  Xi  > t. 

Proportional  Hazards  Model  for  BMT  Data 


Let  us  now  consider  the  BMT  example  described  in  Section  1.1  to  illustrate  how 
a model  of  the  form  (2.1)  can  be  used  to  accommodate  informative  censoring.  A 
patient  can  experience  two  types  of  events,  BMT  and  relapse  or  death.  Since  time  to 
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transplantation  is  the  outcome  variable  of  interest,  we  may  regard  BMT  as  the  event 
of  interest,  and  refer  to  time  to  BMT  as  time  to  event  type  1.  A patient  who  relapses 
or  dies  before  BMT  will  have  his/her  time  to  event  type  1 informatively  censored  by 
relapse  or  death.  Let  us  refer  to  relapse  or  death,  whichever  occurs  first,  as  event 
type  2.  If  T is  the  time  to  BMT  (event  type  1)  and  U is  the  time  to  relapse  or  death, 
whichever  occurs  first  (event  type  2),  the  model  (2.1)  can  be  adapted  to  represent  the 
hazards  associated  with  T and  U. 

There  are  a variety  of  possible  forms  of  h{xi)  that  can  be  used  for  modeling  BMT 

data.  A simple  form  containing  an  easily  interpretable  parameter  a is 

h{x\)  = exp{axi)I{y^  = l,w^  = 2).  (2-2) 

Substituting  (2.2)  into  (2.1),  the  model  (2.1)  can  be  expressed  as 
A„i(t|z)  = Ao„i(t)exp(z/3),  — 1,2 

and 


A„2|„i(t|a;i,  z)  = Xoy2{t)exp{z/3  + Xia)l{t  > Xi,Vi  = 1,  r>2  = 2).  (2.3) 

Since 

A2|i(t|a:i  + l,z)  ^ 

A2ji(t|a:i,z) 

q;  can  be  interpreted  as  the  log  of  the  hazard  of  relapse  or  death  at  time  t for  a subject 
who  had  BMT  at  time  + 1 relative  to  a subject  who  had  BMT  at  time  Xi.  Thus, 
a is  a measure  of  the  effect  of  the  time  to  BMT  on  the  time  to  relapse  or  death. 
Depending  on  a > 0 or  a < 0,  the  risk  of  relapse  or  death  increases  or  decreases  with 
increasing  time  to  BMT.  A similar  interpretation  can  be  made  for  (3. 

The  Likelihood  Function 


The  likelihood  function  for  randomly  censored  survival  time  data  under  the  pro- 
portional hazards  model  (2.1)  can  be  constructed  as  follows.  As  before,  let  T and  U 
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be  the  times  to  events  of  types  1 and  2,  respectively,  and  let  C denote  the  time  at 
which  a subject  is  randomly  censored.  Then  the  observed  outcome  for  a subject  can 
be  classified  into  four  categories  as  shown  in  Table  2.1. 


Table  2.1.  Observed  outcome 


Category 

Outcome  characteristic 

Observed  variables 

1 

C <T,C  <U 

C 

2 

U <T,U  <C 

U 

3 

T <C  <U 

T,C 

4 

T <U  <C 

T,U 

In  the  BMT  example,  a category  4 patient  is  one  who  relapses  or  dies  after  BMT, 
while  a category  3 patient  has  BMT,  but  is  lost  to  followup  before  relapse  or  death. 
Category  2 and  1 patients  do  not  have  BMT  because  they  relapse,  die,  or  are  lost  to 
followup  before  BMT.  If  we  let  = I{T  < U f\C),  where  U f\C  = min{U^C]  and 
^ = I{U  < C),  then  it  can  be  verified  that  (1  — 5)(1  — ^),  (1  — tf(l  — (^),  and  5^ 
are  the  indicator  functions  of  categories  1,  2,  3,  and  4,  respectively. 

Assume  that  (T,  U)  and  C are  independent.  Then  a subject  who  is  followed  until 
the  second  event  at  time  t after  experiencing  the  first  event  at  time  xi  contributes 
5(a;i  |z)Ai,i  (xi  |z)5(f|xi,  z)A„2|„i  z)  to  the  likelihood.  Similarly,  a subject  who 
experienced  only  the  first  event  at  time  t contributes  |z)Aj,i  (f|z).  Hence,  if  Lj  is 
the  contribution  to  the  likelihood  from  a subject  in  category  i,  then  the  contributions 
are 

Li  = 

La  = {5(f|z)A2(t|z)}f'“'^)^, 

Ls  = {5(a:i|z)Ai(a;i|z)S'(f|a;i,z)}'*(^"^\ 


and 


Li  = {5'(xi|z)Ai(xi|z)5(f|a;i,z)A2|i(^|a:i,z)}‘^^. 
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Consequently,  the  likelihood  is  proportional  to  the  product  of  the  four  Lf 

C = L1L2L3L4.  (2.4) 


2.2.2  Partial  Likelihood 


Let  (Tj,  Ui,  Ci,  Zj),  i = 1 . . .n  denote  the  failure  time,  informative  censoring  time, 
random  censoring  time,  and  the  vector  of  covariates  of  n subjects.  A partial  likelihood 
can  be  constructed  on  the  basis  of  the  likelihood  (2.4).  Suppose  there  are  kyi  observed 
times  to  the  first  event  v^,  x^i\  < Xyi2  < ■ ■ . < 1 > = 1)  2,  and  there  are  /„2 

observed  times  to  the  second  event  ty2i  < 1^22  < ..  < ^ . Also,  let 

7^i(x„ij)  and  7?-2(f„2j)  be  the  risk  sets  for  the  first  and  second  events  at  times  and 
f„2j,  respectively.  Then  the  information  in  the  data  can  be  represented  as  a sequence 


(Qj;l  1 ) -fljl  1)  • • • ) , kkj,2j  , 5i,2j  , . . . , ) 

in  the  following  way.  For  the  first  event  and  time,  let  be  the  event  that  subject 
i experiences  event  at  time  Xyii,  and  Qyn  specify  the  risk  set  7^i(x„ij),  all  the 
covariate,  and  censoring  information  in  the  interval  [xyi(^i-i),Xyii),  i = 1 . . . + 1 
with  3;„io  = 0 and  = 00.  Similarly,  for  the  second  event  and  time,  let  5„2j 

be  the  event  that  subject  i fails  from  event  at  time  f„2j,  and  14^2^  specify  the  risk 
set  7^2(^ij2i),  all  the  covariate,  and  censoring  information  in  the  interval  t„2j), 

i = \ . ,ly2  + I with  ty2Q  = 0 and  = 00. 

Then  the  full  likelihood  for  the  first  event  and  time  can  be  written  as 


2 fc„i+i 


\^yl  )Vi,l  b 


lll=l  2=1 


«1  = 1 2=1 


where  = {Q„ii,...  ,Qvh),  and  consists  of 

all  information  prior  to  time  0.  Assuming  that  censoring  provides  little  information 
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about  covariate  effects  on  failure  rates,  the  second  product  will  be  ignored,  so  that 
the  likelihood  is  reduced  to 

2 


t;l=l  j=l 

Similarly,  the  full  likelihood  for  the  second  event  and  time  conditional  on  the  first 
event  and  time  can  be  written  as 

i=l  j— 1 

The  partial  likelihood  for  the  second  event  and  time  conditional  on  the  first  event 
and  time  is  obtained  by  ignoring  the  second  product: 

y2^yl 

The  partial  likelihood  for  all  data  is 

VC  = VC^^^  xVC^^l 

Under  the  proportional  hazards  model  (2.1)  and  the  assumption  that  only  one 
event  occurs  at  each  and  1^2^,  the  partial  likelihoods  for  the  first  and  second 
event  times  reduce  to 


2 ^ 

^ V 


»’4‘>=nn  ^ 

ui=i  i=i 


Ajljl  {Xylj^ 


2 *:„! 

JlMi  \^men,Mexp{z^f3) 


exp{zS) 


and 


J’4^’=nif\s  A rt  I ) 


2 ^ 
^ V 


p{zi(3)h{xu) 


\^meTZ2{t^2peXp{Zm(3)h{xim) 
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Hence,  the  overall  partial  likelihood  is 

VCn  = X ^ 


nn 

iii=i  1=1 


exp{zif3) 


nn 

i— 1 


exp{zi(3)h{xii) 


:^^.)exp{z^f3)h{xim)  J ' ^2.5) 


2.3  Properties  of  Maximum  Partial  Likelihood  Estimators 
2.3.1  Counting  Process  Representation  of  Partial  Likelihood 

A counting  process  representation  of  the  partial  likelihood  (2.5)  can  be  obtained 
as  in  Fleming  and  Harrington  (1991).  Let  Xu  = min{Ti,  Ui,  Cj},  X2i  = min{Ui,  Q}, 
Si  = I{Ti  < Uif\Ci),  and  E,i  = I{Ui  < Cj).  Define  the  processes  Nui,  Ni2i,Yu,  N2i, 
and  Y2i  as 

•^lli(^)  = ^{Xii  < t,  = 1}, 

Nm{t)  = l{Xu<t,Si  = 0,^,  = l}, 

Yuit)=l{Xu>t}, 

^2i{t)  = l{X2i  < t,6i  = 1,  = 1}, 


Y2i{t)  — I{Xii  < t,X2i  > t}- 


Given  the  counting  process  N,  let  AN{u)  = 1 if  N{u)  — N{u—)  = 1,  and 
AN{u)  = 0 otherwise.  Then  the  partial  likelihood  (2.5)  can  be  expressed  in  terms  of 
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Niu,Ni2i,N2i,Yu,  and  ¥21  as 


^^»=nnn 


5=1 i=l  0<u 


Yii{u)exp{zi(3) 
ELi  Yij{u)exp{zj(3) 


1 AA''i,i(u) 


X 


nn 

i=l  0<u 


Y2i{u)exp{ziP)h{xii) 
ELi  Y2j{u)exp{zj/3)h(xij) 


1 AN2i(u) 


Hereafter,  we  assume  that: 


(2.6) 


1.  The  processes  Nm,  Nm,  and  A^2i  are  adapted  to  the  respective  a-fields  of  histories 

Tint  - A^iii(u),  Yii{u+)  :0  <u  <t}, 

^Uit  = cr{Zi,  Ni2i(u),  Yii{u+)  : 0 < u < t}, 


and 


T2U  = cr{Zj,  Xii,  N2i{u),  Y2i{u+)  :0  <u  <t}. 


2.  The  compensators  of  Nm,  Nm,  and  A^2i  are  of  the  form 
Aui{t)  = / am{u)du, 

Jo 


A 


12i 


(t)  = / ai2i{u)du, 

Jo 


(2.7) 


and 


A 


2i 


{t)=  [ a2i{ 

Jo 


u)du, 


where 


aui{u)  = Xon{u)exp{zif3)Yu{u), 
ai2i{u)  = Xoi2{u)exp{zi^)Yii{u), 


and 


a2i{u)  = Xo2{u)exp{zi/3  + xua)Y2i{u)I{u  > Xn). 
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Then  as  shown  in  Fleming  and  Harrington  (1991),  the  partial  likelihood  (2.6)  can 
be  interpreted  as  the  partial  likelihood  arising  from  the  processes  Nm,  Ni2i,  and  N2i 
with  compensators  ^uj,  Hi2j,  and  A2i- 

In  the  BMT  example,  Nm  and  Nm  are,  respectively,  the  processes  that  count  the 
number  of  BMTs  (type  1 events)  and  the  number  of  relapse  or  death  before  BMT 
(informative  censorings),  with  the  associated  at-risk  process  Yu.  The  processes  A^2i 
and  Y2i  are  the  counting  and  at-risk  processes  of  the  relapse  or  death  (type  2 events) 
experienced  after  BMT,  respectively. 

2.3.2  Estimation  of  Regression  Parameters 


In  this  section,  we  assume  the  form  (2.2)  for  h{xi)  in  the  partial  likelihood  (2.6) 
and  investigate  the  properties  of  maximum  partial  likelihood  estimator  of  the  param- 
eter 0 = (/3,  a).  When  h{xi)  is  defined  by  (2.2),  the  partial  likelihood  (2.6)  can  be 
expressed  as 


2 n 


p£„{e)=niin 


s=l  i=l  0<u 


nn 


i=l  0<u 


Yu(u)exp(zjj3) 
E”=i  Yij(u)exp(zj/3) 

Y2i(u)exp(zi/3  Yxua) 


-1  AJV2i(u) 


(2.8) 


^2j(’‘)exp(zj/3  + x,ja) 


The  maximum  partial  likelihood  estimator  of  9 is  the  solution  of  the  equation 

U(«)  = = 0. 


(2.9) 


We  will  use  the  partial  likelihood  (2.8)  as  our  basis  for  inference  about  9 — {(3,  a). 

In  Theorem  2.1,  we  show  that  the  maximum  partial  likelihood  estimator  of  9 is 
unique  by  showing  that  log  partial  likelihood  is  a concave  function  of  0 = (/3,  a). 

Theorem  2.1 

Log  VCn{9)  in  (2.9)  is  a concave  function  of  9 = {(3,  a). 
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Proof 

For  simplicity,  let  us  assume  that  the  covariate  Z is  univariate.  The  proof  when  Z is 
multivariate  is  a straightforward  extension  of  the  univariate  case. 

We  will  prove  the  theorem  by  showing  that  the  negative  of  the  2x2  matrix  of 
second  partial  derivative  of  logVCn{0)  is  nonnegative  definite.  Now,  U(0)  in  (2.9) 
can  be  expressed  as 

TT//^\  94>2  d(j)3  . ^ 


where 


and 


)}dNui{u), 


)}dN,2i(u), 


03  — 


{logV2i{u)  + Zi(3  + xua 


log(J2y2j{n)e^^^^^^n}dN2i{u). 

t=i 


Hence,  if  H is  the  negative  of  the  second  partial  derivative  matrix  of  logVCn{0), 
then 

H = Hi+H2  + H3, 

where  Hi,  H2,  and  H3  are  the  negatives  of  second  partial  derivative  matrices  of  0i, 
02,  and  03,  respectively.  Hence,  H is  nonnegative  definite  if  Hj  is  nonnegative  definite 
for  j = 1,  2,  3.  That  the  Hj  are  nonnegative  definite  can  be  shown  as  follows. 

We  consider  the  case  of  H3.  The  proofs  for  Hi  and  H2  are  similar.  Differentiating 
the  negative  of  03  twice  with  respect  to  9 = (/3,  a),  we  get 

TT  ^11  hi2 
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where 


E”=i  y2j(u)zje 


ZjP+xua  ' 


E"=i  I ELi 


N2.{u)  = '^N2i{u), 


i=l 


Bj{u)  = 

IV, M = 

Ej=i  5j(^) 

n 

^i(“)  = 

j=l 


dN2i{u). 


Then 


Similarly, 


z{u)  = Wj{u)zj. 

i=l 


/ill  — 


- z{u)ydN2.{u). 


- z{u))dN2.{u) 


/122  — 


dA^2.(^)- 


and 


Hence,  if 
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f3j(«)  = 


Zj  - z(u) 

X\j  — Xi  (u)  J ’ 


then  H3  can  be  expressed  as 


H. 


-f 


j'=l 


dN2.{u). 


Consequently,  for  any  vector  y'  = (^1,2/2), 


r 

.••y'H3y=  / 

Jo 

-f 


^3Whj{u%j{u)y 

Lj=i 


dN2iu) 


Lt=i 


dN2.{u)  > 0, 


which  implies  that  H3  is  nonnegative  definite.  □ 


2.3.3  Asymptotic  Distribution  of  Maximum  Partial  Likelihood  Estimators 


Let  U(/3,  a,  t)  denote  the  first  derivative  of  log  of  the  partial  likelihood  (2.8),  where 
the  range  of  observation  period  is  restricted  to  the  interval  (0,tj.  That  is,  let 

dlogVCn{(d,  a,t) 


U{9,t)^U{f3,a,t)  = 


d{(3,a) 


where 


2 n 


p£„(/3.a,i)=nn  n 


s=l  i=l  0<u<f 


Yii{u)exp{zi/3) 
Ei=i  Yij{u)exp{zjf3) 


n n 

i=l  0<u<t 


Y2i{u)exp{zj/3  + x^a) 
ELi  Y2j{u)exp{zjf3  + xija) 


1 AA^2.(«) 


(2.11) 


The  asymptotic  distribution  of  Maximum  Partial  Likelihood  Estimators  (MPLEs) 
of  6 = {j3,  a)  can  be  derived  in  two  steps.  The  first  step  is  to  use  the  Martingale 
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Central  Limit  Theorem  to  show  that  converges  in  distribution  to  a 

Gaussian  process,  where  Oq  is  the  true  value  of  9.  The  second  step  is  to  use  a Taylor 
series  expansion  of  U(0,  t),  where  0 is  MPLE  of  9 and  show  that  the  joint  asymptotic 
distribution  of  the  MPLEs  is  a normal  distribution. 

For  the  purpose  of  ready  reference,  we  begin  with  the  statement  of  a version  of 
the  Martingale  Central  Limit  Theorem.  For  a proof  of  a slightly  restricted  version  of 
this  theorem,  see  Theorem  5.3.5  in  Fleming  and  Harrington  (1991). 

Theorem  2.2  (Martingale  Central  Limit  Theorem) 

Let  Cii'  (t)  be  a real-valued  continuous  function  of  t,  0 < t < oo,  for  all  l,l'  G {1 . . . r}. 
Suppose 

(1)  {Ni^i  : i = 1 ...  n,  I = 1 ...  r}  is  a multivariate  counting  process, 

(2)  Hii  is  a locally  bounded  predictable  process  for  all  i = 1 . . .n,  I = 1 . . .r, 

(3)  The  compensator  Ai^i  of  Ni  ^ is  continuous  for  all  i = 1 ...  n,  I = 1 ...  r, 

(4)  For  I = 1 ..  .r  and  for  t > 0, 


f Hi, i{s)d{Nt, 

i=i  -^0 


(s)  - A,i(s)}  = n 


« rt 
i=i  -^0 


Hi,i{s)dMi,i{s), 


where 


Mi,i{s)  = Ni,i{s)  - Ai,i{s) 


is  a local  square  integrable  martingale  such  that  for  each  l,l'  G {1  . . . r}  and  for 
all  t > 0; 


{Ui.UfHt) 


+ Op(l) 
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as  n ^ oo  and  for  any  e > 0, 


{Ui,,,Ui,){t)=n-^Y^  [ {H^{s)fl{\H^{s)\>e}dMs) 

= Op(l), 


as  n oo  . 

Then  as  n oo,  (Ui, . . . ,Ur)  converges  in  distribution  to  an  r-variate  Gaussian 
process  {Wi,...  ,Wr)  having  components  with  independent  increments,  VF/(0)  = 0 

a.s.,  E{Wi{t)}  = 0,  and  E{Wi{s)Wi'  (t)}  = Cn'  (s)  for  all  0 < s < t and  l,l'  G {1 . . . r}. 

□ 

The  asymptotic  distribution  of  n“^/^U(0o,  i)  in  the  case  where  Z is  univariate  is 
given  in  Theorem  2.3.  A straightforward  generalization  of  Theorem  2.3  yields  the 
corresponding  result  for  multivariate  Z. 

Theorem  2.3 

Let  Niii,Ni2i,N2i,yii,Y2i,Aiii,Ai2i,A2i,Zi,Xii,l3,  and  a be  defined  as  in  Section 
2.3.1,  and  9o  = {do,  oq)  be  the  true  value  of  9 = {d,  a).  Define 


Nu(u)  = Zi- 

E"=, 

1 

II 

I^j=l  T2j(u)2:je^jAo+iijao 

and 

22i{u)  Xu  ■ 

Assume  that  these  processes  satisfy  conditions  (1),  (2),  (3),  and  (4)  in  Theorem  2.2. 
Then  as  n ^ oo,  n~^^^lJ{9o,  t)  converges  in  distribution  to  a bivariate  normal  distri- 
bution with  zero  means. 
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Proof 

As  in  the  proof  of  Theorem  2.1,  we  have  the  expression 


U{9o,t) 


d(pi{t)  d(p2{t)  903(0 

96>o  ddo  96»o 


90(0 

900  ’ 


where  the  0j(O  are  the  versions  of  0j  when  the  range  of  observation  is  restricted  to 
[0,t].  Then 


U(9o,0 


9<P{t) 

900 

9<p{t) 

9ao 


5 


where 


i=l 


E?=i  ] 


d[Niii(u)  + A^i2i('a)]  + 


" J EJ=i  I 

f^Jo  EEi  J 


dN2i{u) 


and 


c'a=E 


EEi  J 


dN2i{u). 


Let 


Ej=i 

Ej=i>^ij(«)e"^'^''  ’ 


■£'21  (9o,  OiQ,  u) 


-£'22(9o,  Q!o,  w) 


EEl 

X]"=i 

E”=l 


MH(t)  = iViH(0  + iVi2i(0  - r Tn(n)e^'^‘>(A  oii(^)  + Aoi2(^))drx 

Jo 


Nii{t)  — Aii{t) 


32 


and 


M2i{t)  ^ N2i{t)  - [ Y2i{u)e^'f^°+^^'‘^no2{u)du 
Jo 

= N2i{^)  — ^2i{i)- 


Then  from  Section  1.3  in  Fleming  and  Harrington  (1991),  Mu{t)  and  M2i{t)  are 
martingales,  and  in  view  of  the  fact  that 

EJ=i  - El  (/?o,  u))Yij{u)e^^^^ 

YTj=i{zj  - E2i{(Jq,  c^o, 

E”=i  ~ ’ 


and 


YTj=i{^ij  - E22{^o,  ap, 

Yl^j=l  l'2j(u)e^J^0+^O“0  ’ 

Ub  and  Ua  have  the  martingale  structure, 

n pt  n 

Eb^'^  Hii{u)dMu{u) + Y^  H2ii{u)dM2i{u) 
1=1  1=1 


and 


H22i{u)dM2i{u). 


To  simplify  notation,  let  us  denote  n~'^Eu^  _ ^-i/2fj^  _ Then  the 

asymptotic  bivariate  normality  of  n“^/^U(0O)  = 'n'~^E(Ub,  Ua)  can  be  established  in 

the  following  three  steps: 


Stepl;  show  that  for  arbitrary  constants  Oi  and  02,  (1/1,  aii/2  + <222/3)  has  an  asymp 
totic  bivariate  normal  distribution  with  zero  means. 
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Step2;  show  that  {yi, 1/2,113)  has  an  asymptotic  trivariate  normal  distribution  with 
zero  means. 

Step3;  show  that  n Ug)  has  an  asymptotic  bivariate  normal  distribution 

with  zero  means. 

Now,  for  arbitrary  constants  aj  and  02,  aiy2  + 0-2V3  can  be  expressed  as 


Since  H2U  and  H22i  are  locally  bounded  and  predictable  by  assumption,  H2i  = 
o-itl2\i  + 02^/221  is  also  locally  bounded  and  predictable.  Also,  by  assumption,  {Nn  : 
i = 1, . . . n,  / = 1,2}  is  a multivariate  counting  processes  with  continuous  compen- 
sators. Hence,  it  follows  from  Theorem  2.2  that  (yi,aiy2  + 02^3)  converges  to  a 
bivariate  normal  distribution  as  n — > 00  with  zero  means  and  covariance  matrix 


n 


where 


H2i{u)  = aiZi  + a2Xii 


E"=i  F2j(u)e^i/5o+^o“o 


r^(f\  _ '-luv 

[cu{t)  C22(t) 


Cll(t)  Ci2(t) 


with  elements: 
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and 


C22{t)  = ~^E 
n 


[H2u{u)fd{M2i,  M2i){u) 


+ 


n 


iu)fd{M2i,M2i){u) 


H2li[u)H22j{u)d{M2i,  M2j)(u) 


Now,  by  assumption,  {Nu  : i = 1 . . . ra,  / = 1,  2}  is  a multivariate  counting  pro- 
cesses with  continuous  compensators.  Hence,  by  Theorem  2.5.2  in  Fleming  and  Har- 
rington (1991) 

{Mii,  Mii){u)  = i = l...n,  I = 1,2, 

{Mu,  M2j){u)  = 0,  i = 1 . . .n,  j = 1 . . .n. 


and 


{M2i,  M2j){u)  = 0,  z / j,  1 . . . n. 


Consequently, 
cii(t)  = n~^E 


[Hu{u)fdAu{u) 


Ej=i 

E”=i  Yij{u)e^i^o 


\ E”=i  J ) 


X 


: li 


\Qi{u)du 


2-2gZ,/3o 


Ej=i  ) 


\Qi{u)du 


Ci2{t)  = 0, 
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and 


C22(«)  = W2u{u)?dA^(u)j  + i«MuWdA,t(u) 


I 


( n 

^r2i(u)^2e"-^0+Xi.ao 


i=\ 


ELi  Y2j{u)zje 


• p^j  /^0“^“^li^0 


/ 


do 

+ -^£; 
n 


ft  / n 


^2  2i/3o+iiiao 


i=l 


(E”=i 

EJ=1  i^2j(«)a;ije^J^O+Xiiao 


Xo2{u)du 


Xo2{u)du 


2 2 I 2 2 
= «1%  +«2f^y3- 


To  implement  Step2  of  the  proof,  let  li  and  I2  be  arbitrary  constants.  Since 
iyi,o-iy2  + 022/3)  converges  to  a bivariate  normal  distribution  with  zero  means  and 
zero  covariance,  liUi  + l2{cL\y2  + O22/3)  converges  to  a normal  distribution  with  mean 
zero  and  variance  l\cii[t)  + llc22{t)  — ^\0'y^  + ^2(“i'^y2  For  given  constants 

61,  b2,  and  b^,  we  can  choose  li,  I2,  ai,  and  02  such  that  b^  — b2  = /2O1,  and 
bs  = hO‘2-  Hence,  it  follows  that  b\yi  +622/2  + 632/3  converges  to  a normal  distribution 
with  mean  zero  and  variance  b\uy^  +^2^y2  arbitrary  constants  61,  62,  and  63. 

Consequently,  by  the  Cramer-Wold  device,  which  is  stated  as  Lemma  5.2.1  in  Fleming 
and  Harrington  (1991),  (2/1,2/212/3)  converges  to  a trivariate  normal  distribution  with 
zero  means  and  covariance  matrix 


D{t)  = 


< 0 0 
0 < 0 
0 0 


Hence,  n ^^^(Ub,Ua)  converges  to  a bivariate  normal  distribution  with  zero  means 
and  covariance  matrix 


m 


cjy^  + a. 


V2 


(2.12) 


□ 

Finally,  the  asymptotic  normality  of  MPLEs  is  stated  in  Theorem  2.4  (3). 
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Theorem  2.4 

Let  0 be  the  MPLE  of  6,  and  9q  be  the  true  value  of  6.  Assume  the  conditions  of 
Theorem  2.3.  Then 

(1)  Let  E{t)  in  (2.12)  be  nonsinguiar.  A consistent  estimator  ofH{t)  is  provided  by 

where  1(0,  t)  is  the  information  matrix  defined  as  the  negative  of 
second  partial  derivative  of  log  partial  likelihood. 

(2)  0 is  a consistent  estimator  of  Oq.  That  is  0 converges  in  probability  to  Oq. 

(3)  ni/2(0-6>o)  converges  in  distribution  as  n — >■  oo  to  a mean  zero  bivariate  normal 

variable  with  covariance  matrix  T,~^(t). 

Outline  of  Proof 

Consistency  of  n~^l(9,t)  and  9 can  be  established  using  arguments  similar  to  those 
in  the  proofs  of  Theorem  8.2.1  and  8.3.1  in  Fleming  and  Harrington  (1991).  In 
particular,  results  based  on  Lenglart’s  Inequality  stated  as  Lemma  8.2.1  in  Fleming 
and  Harrington  (1991)  and  concavity  of  log  partial  likelihood  can  be  used  to  establish 
consistency  of  n~^l(9,t)  and  9,  respectively.  Then  the  asymptotic  normality  of  9 
can  be  established  using  Theorem  2.3  and  the  first  order  of  Taylor  series  for  U(0,  f) 
centered  around  9q: 

U(9,  i)  = U(9„,  t)  + Tu(9,  t)lg^g-(e  - 9o), 

where  9*  is  on  a line  segment  between  9 and  0q,  which  results  in 

n-‘l(r , t)n’/2(0  - 6>o)  = n-^^^V(9o,  t) 


since  U(0,  t)  = 0.  □ 


CHAPTER  3 

PERFORMANCE  OF  MAXIMUM  PARTIAL  LIKELIHOOD  ESTIMATORS  (MPLES) 


3.1  Introduction 


In  this  chapter,  we  will  report  the  results  of  a simulation  study  to  examine  some 
performance  characteristics  of  MPLEs.  In  the  simulation  study,  we 

1.  examine  the  effect  of  censoring  rates  on  MPLEs  using  bias,  mean  square  error 
(MSE),  and  mean  width  of  95%  confidence  interval, 

2.  use  histograms  to  study  the  adequacy  of  an  asymptotic  normal  approximation 
to  the  sampling  distributions  of  MPLEs, 

3.  use  the  bias,  MSE,  and  the  mean  width  of  95%  confidence  interval  to  study  the 
effect  on  MPLEs  of  treating  informative  censoring  as  random  censoring. 

3.1.1  General  Scheme 


For  given  N and  n,  we  simulated  N random  samples  of  size  n from  the  condi- 
tional distribution  of  the  trivariate  survival  time  {T,U,C)  defined  in  Section  2.2.1 
conditional  on  the  covariate  Z = z.  The  value  z of  Z was  selected  from  a Bernoulli 
(.5)  distribution.  For  selected  values  of  Aqi,  Aq2,  /?)  and  a,  the  conditional  (on  Z = z) 
hazards  at  T = xi,  U = «i,  and  U = U2  given  T = xi  were  assumed  to  be 

Xt{xi\z)  = Aoie^^, 

A[/(Mi12:)  = Ao2C''^,  (3.1) 
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and 

\u\t{U2\Xi,z)  = > Xi). 

Then  the  joint  density  of  T and  U was 

= fT{Xl\z)fuiT{u2\xi,z), 

where 

/r(ii|2)  = . ^°‘.  (A,i  + 

/^oi  + Ao2 
-7rAe-^^S 

7T  — -^01 

Aoi  + Ao2 

A = (Aoi  + Ao2)e^^, 

and 

fu\T{u2\xuz)  = I{U2  > Xi), 

where 

A*  = Ao2e^^“+"^. 

The  density  of  U was 

fu{ui\z)  = (1  - 7r)Ae“^“‘. 

The  required  samples  were  simulated  as  follows. 

1.  Generate  a value  2 of  Z from  a Bernoulli  (.5)  distribution. 

2.  Generate  a value  u from  a Uniform  (0,1)  distribution. 
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3.  (a)  If  M < 7T  then  generate  a value  xj  from  an  exponential  distribution  with  rate 
A,  and  generate  a value  U2  from  fu\T{u2\xx,  z). 

(b)  If  u > 7T  then  generate  a value  Ui  from  an  exponential  distribution  with  rate 
A. 

4.  Generate  a value  c of  C from  an  exponential  distribution  with  rate  Ac- 

For  the  sample  l,l  = 1 . . . N,  the  MPLE  of  9 = [f5,  a)  was  obtained  by  maximizing 
the  partial  likelihood  using  the  Newton-Raphson  method.  The  variance-covariance 
matrix  of  the  asymptotic  distribution  of  6i  was  estimated  from  the  sample  by 
inverting  the  observed  information  matrix  denoted  as  A;  = 1 < i,  j < 2,1  = 

I ...  A.  Let  6 — {9i . . . Gj^f).  The  E{9i)  was  estimated  by 


5 1 


N 


(=1 


The  actual  variance-covariance  matrix  of  6 was  estimated  by  the  sample  variance- 
covariance  matrix, 

1 ^ - 
^ (=1 

3.2  Simulation  Results  and  Discussion 
3.2.1  Effect  of  Different  Censoring  Rates 


In  order  to  examine  the  effect  of  censoring  rates  on  MPLEs,  we  divided  the  possible 
outcomes  into  four  categories  as  shown  in  Table  3.1.  Let  Pi{z)  be  the  conditional 
(given  Z = z)  probability  of  an  outcome  in  category  i,  i = 1 ...  4.  That  is 

Pi{z)  = P(outcome  in  category  i\z),  i = 1 . . .4. 
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Category 

1 

2 

3 

4 


Table  3.1.  Observable  outcome  by  category 
Outcome  characteristic  Description 


C <T,C  <U 
U <T,U  <C 
T <C  <U 
T <U  <C 


Randomly  censored 
Informatively  censored 
Randomly  censored  after  first  event 
Second  event  after  first  event 


Then  the  unconditional  probability  for  category  i is 


Pi  = Y^p{z)Pi{z),i  = 1 . . .4, 

Z 


where  p{z)  is  the  probability  mass  function  for  Z.  For  example,  when  Z has  a 
Bernoulli  (.5)  distribution, 


A = U 

Z ^ 


(Aqi  + Ap2)e^^  \ 

Ac  + (Aoi  + Ao2)e^^/ 


= (-5)  1 - 


(Aqi  + A02) 

Ac  + (Aoi  + A02) 


+ (-5)  1 - 


(Aqi  + Ap2)e^ 

Ac  + (Aoi  + Ao2)c^ 


Thus,  the  Pi  can  be  expressed  in  terms  of  Aoi,  A02,  Ac,  13,  and  a.  For  example,  when 
Aoi  = -5,  Ao2  = -25,  Ac  = .35,  (3  = .7,  and  a = 0,  each  Pi,  i = 1,  2, 3, 4 is  about  25%. 
The  different  values  of  Aoi,  A02,  K,  f3,  and  a were  needed  to  adjust  the  expected 
proportions  of  the  four  categories. 

Let  9j  denote  the  element  of  0 = (/?,  a),  j = 1,  2.  To  evaluate  the  performance 
of  MPLEs,  we  calculated  the  bias  and  MSB  of  9j  as 

Biasj  = 9j  — 9j 


and 


MSEj  = {9j  — 9j)'^  + Tjj, 
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where  Vjj  is  the  diagonal  element  of  R.  In  addition,  the  mean  width  of  100(1  — o;)% 
asymptotic  confidence  interval  for  9j  was  calculated  as 

^ 1 


N 


w 

' N 


i=i 


where 


One  thousand  random  samples  of  sizes  n = 50  and  n = 200  were  generated  with 
9i  = P = .7  and  92  = a = .7  from  five  different  distributions  for  (T,  U,  C)  corre- 
sponding to  the  five  combinations  of  the  Pi  listed  in  the  first  column  of  Table  3.2. 
Table  3.2  shows  the  bias,  MSE,  and  the  mean  width  of  95%  confidence  interval  in 


Table  3.2.  Bias,  MSE,  and  the  mean  width  when  n = 50  and  200,  where  P = a = .7 


Px 

P2 

^3 

A 

MPLE 

Bias 

n=50  n=200 

MSE 

n=50  n=200 

Mean  width 
n=50  n=200 

.63 

.18 

.10 

.09 

.038 

.005 

.30 

.06 

2.03 

.91 

6i 

13.669 

.127 

16330.02 

2.81 

4043.47 

25.11 

.13 

.61 

.03 

.23 

p 

.023 

.002 

.11 

.02 

1.21 

.57 

a 

-.584 

-.034 

353.21 

.23 

94.72 

1.64 

.24 

.03 

.59 

.14 

0 

.023 

.006 

.12 

.03 

1.30 

.62 

a 

.273 

.019 

56.64 

.12 

108.30 

1.23 

.09 

.20 

.12 

.59 

13 

.020 

.006 

.08 

.02 

1.04 

.49 

a 

.057 

.005 

.06 

.01 

.86 

.34 

.28 

.22 

.24 

.26 

P 

.017 

.006 

.11 

.03 

1.27 

.60 

a 

.025 

-.005 

2.05 

.21 

4.93 

1.73 

each  of  the  five  categories  with  sample  size  of  50  and  200.  When  the  sample  size  was 
200,  the  bias  and  MSE  of  both  P and  a were  reasonably  small.  However,  when  the 
sample  size  was  50,  the  bias  and  MSE  of  a became  large.  The  largest  bias  and  MSE 
of  a were  13.669  and  16330.02,  respectively,  and  both  correspond  to  the  high  Pi  case. 
The  bias  and  MSE  of  P in  this  case  were  still  reasonably  small.  This  indicated  that 
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inference  based  on  a was  not  recommended  when  the  sample  size  was  small.  Note 
that  the  smallest  MSEs  of  a occured  when  P4  was  high  in  both  sample  sizes.  The 
coverage  probabilities  of  95%  confidence  intervals  were  reasonably  close  to  95%  in  all 
cases,  where  the  worst  coverage  was  86%  of  intervals  for  a corresponding  to  a high 
P\  case  when  n = 50.  The  mean  widths  of  the  95%  confidence  intervals  for  /?  and  a 
were  much  narrower  when  the  sample  size  was  200  compared  to  when  the  sample  size 
was  50.  In  general,  regardless  of  the  sample  size,  the  mean  widths  of  95%  confidence 
intervals  for  a were  wider  than  those  for  j3  except  the  case  when  P4  was  high.  The 
largest  mean  widths  of  intervals  for  a when  the  sample  size  was  50  and  200  were 
4043.47  and  25.11,  respectively,  both  corresponding  to  a high  Pi  case.  In  summary, 
the  performance  of  MPLE  of  a was  best  when  P4  was  high  and  worst  when  Pi  was 
high,  while  the  performance  of  MPLE  of  (5  was  consistently  good,  with  the  worst 
performance  occurring  when  Pi  was  high. 

3.2.2  Normal  Approximation  to  the  Sampling  Distribution 

Figures  3.1,  3.2,  and  3.3  show  histograms  of  the  MPLEs  of  (3  and  a when  1000 
random  samples  were  generated  from  population  with  (3  = a = .7,  Aqi  = .4,  A02  = -3, 
and  Ac  = .1,  respectively.  These  values  correspond  to  the  expected  proportions  of  .39 
and  .16  of  informatively  censored  and  randomly  censored  observations,  respectively. 
When  the  sample  size  was  large  (n  = 200),  Figures  3.1  (a)  and  (b)  both  indicate  that 
the  sampling  distributions  of  (3  and  a were  well  approximated  by  normal  distributions. 
When  the  sample  size  was  small  (n  = 50),  Figure  3.2  still  shows  a reasonable  normal 
approximation  to  the  sampling  distribution  of  j3.  But  in  Figure  3.3,  the  sampling 
distribution  of  a appeared  to  be  slightly  skewed,  where  the  minimum  and  maximum 
of  a were  —1.13  and  3.05,  respectively.  This  was  not  surprising  since  a was  estimated 
only  from  category  4,  while  /?  was  estimated  from  three  categories,  2,  3,  and  4. 
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Figures  3.4  and  3.5  show  the  histograms  of  the  MPLEs  of  a based  on  sample  size 


of  200.  When  the  probability  of  F4  was  high  (=  60%),  Figure  3.4  shows  a good  normal 
approximation  to  the  sampling  distribution  of  a,  where  the  minimum  and  maximum 
were  .43  and  1.03,  respectively.  But  when  P\  was  high  (=  60%),  Figure  3.5  shows  a 
poor  normal  approximation,  where  the  minimum  and  maximum  were  —.80  and  51.04, 
respectively.  Note  that  the  maximum  of  51.04  was  out  of  range  in  Figure  3.5.  In  view 
of  that  a is  a parameter  measuring  association  between  T and  U , it  is  reasonable  to 
expect  that  the  performance  of  MPLE  of  a should  be  good  in  the  high  P4  case,  where 
both  T and  U are  observed,  and  poor  in  the  high  P\  case,  where  neither  T nor  U is 
observed. 

3.2.3  Effect  of  Ignoring  Informative  Censoring 

To  examine  the  effect  of  treating  informative  censoring  as  random  censoring  on 
MPLE  of  /3,  we  generated  1000  random  samples  of  sizes  n = 50  and  n = 200  from 
the  population  with  a = .7,  a specified  value  of  /?,  Aoi  = .4,  A02  = -3,  and  Ac  = .1, 
respectively.  The  value  of  P varied  from  —.7  to  .7.  For  each  selected  sample,  the 
MPLEs  were  calculated  once  assuming  model  (3.1)  and  then  again  treating  informa- 
tive censoring  as  random  censoring.  The  reduction  in  bias  when  informative  censoring 
was  accounted  for  in  the  model  was  calculated  as 


where  Bias-N  and  Bias-I  were  the  biases  when  informative  censoring  was  not  ac- 
counted for  and  was  accounted  for  in  the  model,  respectively.  The  reductions  of  MSE 
and  the  mean  width  were  similarly  calculated.  The  reductions  in  bias,  MSE,  and 
the  mean  width  when  informative  censoring  was  taken  into  account  in  the  model 
are  shown  in  Table  3.3.  Regardless  of  the  true  value  of  /?  and  sample  size,  when 
informative  censoring  was  accounted  for  in  the  model,  the  reductions  of  MSE  were 


Frequencies  Frequencies 
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(a)  Plot  of  MPLE  of  beta  n=200 


o 

lO 

CM 


o 

in 

o 


0.0  0.5  0.7  1.0  1.5 

MPLE  of  beta 


(b)  Plot  of  MPLE  of  alpha  n=200 


0.0  0.5  0.7  1.0  1.5 

MPLE  of  alpha 


Figure  3.1.  Plots  of  MPLE  when  n = 200 


Frequencies 
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Plot  of  MPLE  of  beta  n=50 


I I I ' I I I I 

-0.5  0.0  0.5  0.7  1.0  1.5  2.0  2.5 

MPLE  of  beta 


Figure  3.2.  Plot  of  MPLE  of  /?  when  n = 50 


Frequencies 
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Plot  of  MPLE  of  alpha  n=50 
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Figure  3.3.  Plot  of  MPLE  of  a when  n = 50 


Frequencies 


47 


Plot  of  MPLE  of  alpha 


MPLE  of  alpha 


Figure  3.4.  Plot  of  MPLE  of  a when  P4  is  high 
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Plot  of  MPLE  of  alpha 
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Figure  3.5.  Plot  of  MPLE  of  a when  Pi  is  high 
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Table  3.3.  Reductions  in  bias,  MSE,  and  mean  width  of  confidence  interval  for  /3 


Bias 

MSE 

Width 

p 

n 

Bias-P 

Red^-bias 

MSE-I 

Red-MSE 

MeanW"-I 

Red- Mean  W 

50 

-.029 

32.24 

.095 

35.37 

1.17 

19.22 

-.7 

200 

-.008 

9.72 

.021 

33.58 

.55 

17.27 

50 

-.013 

30.39 

.082 

34.35 

1.09 

18.45 

-.3 

200 

-.003 

3.04 

.019 

32.19 

.50 

16.75 

0 

50 

-.004 

13.35 

.077 

31.91 

1.06 

18.04 

200 

-.001 

-9245.45 

.018 

31.79 

.50 

16.53 

50 

.008 

26.78 

.078 

32.78 

1.06 

18.01 

.3 

200 

.000 

82.16 

.018 

32.47 

.50 

16.49 

.7 

50 

.024 

29.07 

.084 

32.19 

1.09 

18.20 

200 

.005 

42.35 

.019 

34.41 

.52 

16.74 

“Informative  Censoring 
^’Reduction  in  percentage 
“Mean  width 


consistently  about  30%.  The  reductions  of  bias  varied  somewhat  with  a large  increase 
occurring  when  P = 0 and  n = 200.  However,  since  the  Bias-I  was  still  very  small, 
the  resulting  increase  may  not  be  of  practical  importance.  Except  this  case,  the  bias 
was  reduced  when  informative  censoring  was  taken  into  account  in  the  model.  The 
coverage  probabilities  of  95%  confidence  intervals  for  /?  were  all  close  to  95%  regard- 
less of  the  true  value  of  P and  sample  size  when  informative  censoring  was  accounted 
or  not  accounted  for  in  the  model  (See  Table  3.4).  The  reductions  of  the  mean  width 
of  95%  confidence  interval  were  consistently  around  20%.  Overall,  when  informative 
censoring  was  taken  into  account  in  the  model,  the  performance  of  P was  consistently 
better.  That  is,  the  bias  was  reduced,  the  MSE  was  smaller,  and  the  mean  width  of 
the  confidence  interval  were  smaller  while  maintaining  the  95%  coverage  probability. 
With  respect  to  the  MPLE  of  a,  the  bias,  MSE,  and  mean  width  of  the  95%  confi- 
dence interval  were  examined  (See  Table  3.5).  There  were  no  significant  reductions 
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Table  3.4.  Coverage  Probability  of  95%  confidence  intervals  for  f3 


n 

Accounted" 

Not  accounted'’ 

-.7 

50 

.95 

.95 

200 

.94 

.94 

-.3 

50 

.94 

.95 

200 

.95 

.94 

0 

50 

.94 

.95 

200 

.94 

.95 

.3 

50 

.95 

.95 

200 

.94 

.95 

.7 

50 

.95 

.96 

200 

.95 

.95 

“Informative  censoring  accounted 
^'Informative  censoring  not  accounted 


in  MSE,  i.e.,  they  were  all  less  than  2%,  and  the  reductions  in  mean  width  were  all 
less  than  1%.  The  bias  was  slightly  increased  mostly  by  less  than  9%  with  a few  cases 
close  to  16%.  The  coverage  probabilities  of  95%  confidence  intervals  for  a were  all 
reasonably  close  to  95%  (See  Table  3.6). 
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Table  3.5.  Reductions  in  bias,  IV 


SE, 


and  mean  width  of  confidence  interval  for  a. 


Bias 

MSE 

Width 

a 

n 

Bias-I“ 

Red''-bias 

MSE-I 

Red-MSE 

MeanW'-l 

Red-MeanW 

50 

.067 

.50 

.145 

1.84 

1.34 

.55 

-.7 

200 

.007 

-3.24 

.015 

-.03 

.48 

.26 

50 

.079 

-1.17 

.172 

.54 

1.38 

.46 

-.3 

200 

.010 

-4.44 

.017 

.02 

.50 

.13 

50 

.071 

-2.85 

.163 

.88 

1.43 

.46 

0 

200 

.007 

-8.74 

.020 

.21 

.53 

.10 

50 

.081 

-2.91 

.191 

.79 

1.52 

.48 

.3 

200 

.005 

-16.39 

.021 

.50 

.57 

.14 

50 

.089 

-2.21 

.257 

1.75 

1.68 

.58 

.7 

200 

.004 

-13.71 

.026 

1.31 

.62 

.29 

“Informative  Censoring 
**  Reduction  in  percentage 
“Mean  width 


Table  3.6.  Coverage  Probability  of  95%  confidence  intervals  for  a. 


n 

Accounted" 

Not  accounted** 

-.7 

50 

.96 

.96 

200 

.96 

.96 

-.3 

50 

.97 

.96 

200 

.96 

.95 

0 

50 

.95 

.95 

200 

.95 

.95 

.3 

50 

.95 

.95 

200 

.96 

.96 

.7 

50 

.94 

.94 

200 

.95 

.95 

“Informative  censoring  accounted 
‘’Informative  censoring  not  accounted 


CHAPTER  4 

RESIDUAL  PLOTS  FOR  MODEL  CHECKING 
4.1  Introduction 


One  advantage  resulting  from  the  use  of  counting  process  representation  is  that 
residuals  can  be  expressed  in  a natural  way  as  the  observed  number  of  events  minus  the 
conditionally  expected  number  of  events.  In  this  chapter,  we  will  examine  the  forms 
of  martingale  and  score  residuals  arising  from  the  counting  process  representation 
and  show  how  they  can  be  used  to  assess  the  adequacy  of  fit  of  the  model  (2.3). 

For  the  proportional  hazards  model,  a well  known  residual  arising  from  the  fitted 
hazard  Xo{t)exp{z/3)  is  the  Cox-Snell  (1968)  residual  defined  as 

A{ti)  = Ao{ti)exp{zi0) 


at  the  observed  failure  time  for  subject.  Here  Ao(tj)  is  the  estimated  cumulative 
baseline  hazard  function.  The  Cox-Snell  residual  estimates  the  cumulative  hazard 
function.  Since  the  cumulative  hazard  function  has  a unit  exponential  distribution, 
the  plot  of  Cox-Snell  residuals  should  approximate  the  distribution  of  a unit  expo- 
nential when  the  correct  model  is  fitted.  However,  Lagakos  (1980)  showed  that  the 
Cox-Snell  residual  is  not  very  reliable  in  assessing  the  adequacy  of  a fitted  model 
because  its  distribution  can  be  quite  different  from  a unit  exponential  distribution, 
especially  in  small  samples,  even  when  the  correct  model  is  fitted  to  the  data.  Con- 
sequently, we  will  restrict  our  attention  to  the  use  of  martingale  and  score  residual 
plots. 
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In  Section  4.2,  the  form  of  martingale  and  score  residuals  are  described,  and  in 
Section  4.3,  a simulation  study  is  used  to  examine  how  plots  of  these  residuals  can 
be  used  to  check  the  adequacy  of  the  fit  of  the  model  (2.3). 

4.2  Martingale  and  Score  Residuals 


In  Section  2.3,  we  have  seen  that 


— N\i 


Yu{u)e'''^dAoi{u)  = Nu{t) 


Aii{t) 


and 


M2i{t)  = N2i{t)  - f = N2i{t)  - A2i{t),  (4.1) 

Jo 

where  Aqi  and  A02  are  the  cumulative  baseline  hazard  functions,  are  martingales 
under  the  compensators  (2.7).  Note  that  when  these  compensators  are  correct,  these 
quantities  can  be  interpreted  as  the  differences  over  [0,  t]  between  the  observed  number 
of  failures  and  a conditionally  expected  number  of  failures  for  the  subject. 

In  order  to  define  martingale  residuals,  the  estimators  of  baseline  hazards  are 
needed  for  estimation  of  compensators  An  and  A2i-  Since  the  partial  likelihood  (2.8) 
cannot  be  used  to  estimate  the  baseline  hazard  functions,  Aoi  and  A02,  we  will  estimate 
Aoi  and  A02  using  the  Breslow  (1972,  1974)  estimators: 

Ao,(«)  = Tf ^ 


dN2i{u) 

Ej=i  Y2j{u)exp{zj^  + 


and 
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where  (3  and  a are  the  MPLEs  of  /3  and  a,  respectively.  Then  martingale  residuals 


for  the  first  and  second  events  are  defined  as 


Mu{t)  = Nu{t) 


I 


Yii{u)e^'^dkoi{u)  = 


and 


M2i{t)  = N,i{t)  - [ Y2i{u)e^‘^+^^^^dAo2{u)=N2i{t)-A2iit),  (4.2) 


Jo 


respectively. 

The  martingale  residuals  can  be  used  to  assess  the  adequacy  of  an  assumed  model 
and  the  appropriateness  of  the  form  of  a covariate  in  a model.  A drawback  of  the 
martingale  residual  is  that  its  distribution  is  not  symmetric  about  0.  Since  N{t) 
equals  one  or  zero  according  as  t is  an  uncensored  or  censored  observation,  each 
martingale  residual  takes  on  values  between  — oo  and  one,  with  the  residuals  for 
censored  observations  being  negative. 

The  score  residuals  provide  an  alternative  to  martingale  residuals.  Corresponding 
to  the  partial  likelihood  of  (2.11)  with  a univariate  z,  the  2-dimensional  score  vector 
residual  is  defined  as  the  vector 


where 


n 


dM2i{u) 
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dM2i{u) 


= E^-fW- 

i=l 


and  Mil  and  M2i  are  the  martingale  residuals. 

Let  {zi  — z)x  residualj  denote  the  sum  of  {zi  — Zi)  x Mu  and  {zi  — Z2)  x M2i-  Note 
that  Uiy{t)  and  Ua{t)  are  the  sums  of  {zi  — z)  x residualj  and  (xi  — xi)  x residualj, 
where  zi,  Z2,  and  xi  are  the  weighted  risk  set  means.  For  subject,  Ubi{t)  and  Uai{t) 
measure  the  influence  of  a value  of  the  covariate  and  the  fit,  and  hence  the  adequacy 
of  the  assumed  model. 


4.3  A Simulation  Study  of  Martingale  and  Score  Residuals 


In  this  section,  we  will  use  simulated  martingale  and  score  residuals  to  illustrate 
how  these  residuals  can  be  used  to  test  the  adequacy  of  the  model  (2.3).  First,  we 
will  simulate  and  examine  the  martingale  and  score  residuals  from  a correctly  fitted 
model.  Then  we  will  simulate  martingale  and  score  residuals  from  an  incorrectly  fitted 
model  and  examine  the  effect  of  incorrect  model  on  both  residuals.  Specifically,  we 
will  use  plots  of  martingale  and  score  residuals  to  assess  the  adequacy  of  the  assumed 
model,  and  use  martingale  residual  plots  to  determine  the  appropriateness  of  the  form 
of  a covariate  in  the  model. 

4.3.1  Checking  the  Adequacy  of  the  Model 


As  in  Chapter  3,  survival  times  were  simulated  with  sample  size  200,  a = (3  = 
.7,  Aoi  = .4,  Ao2  = .3,  Aoc  = -1  in  the  model  (3.1),  and  with  covariate  Z = Zj, 
where  Zi  is  a standard  normal  random  variable.  Figures  4.1  (a)  and  (c)  show  the 
martingale  residuals  of  the  observed  times  of  the  first  events  plotted  against  the 
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ranks  of  the  observed  times  when  the  correct  (Z  ~ Xi)  the  incorrect  {Z  ~ 
A'’(0, 1))  models  are  fitted  to  the  data,  respectively.  Figures  4.1  (b)  and  (d)  show  the 
corresponding  plots  for  the  observed  times  of  second  events.  As  seen  from  the  plots, 
the  martingale  residuals  become  smaller  at  later  times  since  the  number  of  failures 
increases  with  increasing  time  - making  Aq  increase  as  time  progresses.  Since  the 
percent  of  first  events  censored  was  7.5%,  in  Figures  4.1  (a)  and  (c),  a majority  of 
martingale  residuals  are  in  the  upper  clusters.  The  percent  of  second  event  censored 
was  10.6%.  When  an  incorrect  Z is  used,  the  corresponding  f3  will  be  close  to  zero  so 
that  the  estimated  compensators  will  be  close  to  the  Breslow  estimate  of  the  baseline 
hazard.  Consequently,  the  martingale  residuals  are  roughly  N{t)  minus  the  expected 
value  of  exponential  order  statistics.  Thus,  reduced  variability  in  martingale  residuals 
is  an  indication  of  the  inadequacy  of  Z.  Notice  that,  as  expected,  compared  to  Figure 

4.1  (a).  Figure  4.1  (c)  shows  very  little  variation.  However,  it  is  difficult  to  detect  a 
difference  in  Figures  4.1  (b)  and  (d)  because  of  a correct  form  of  Xi  and  an  incorrect 
form  of  Z = Zi. 

Figures  4.2  (a)  and  (b)  are  the  score  residuals  from  the  correct  model  plotted 
against  the  covariates  Z and  Xi,  respectively.  Since  the  residuals  correspond  to  the 
correct  model,  both  plots  should  show  a random  scatter  of  points  centered  around 
zero.  As  expected  from  the  correctly  fitted  model,  there  are  no  discernible  patterns 
in  Figures  4.2  (a)  and  (b).  In  Figures  4.2  (c)  and  (d),  the  score  residuals  from  the 
incorrect  model  are  plotted  against  the  covariates  Z and  Xi,  respectively.  In  Figure 

4.2  (c),  there  is  a pattern  showing  observations  with  small  and  large  score  residuals 
associated  with  small  and  large  values  of  Z,  respectively.  This  is  as  expected  because 
the  correct  model  corresponds  to  a quadratic  function  of  Z.  In  Figure  4.2  (d),  there 
is  no  systematic  pattern  since  the  form  of  Ah  is  correct. 
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4.3.2  Checking  the  Appropriateness  of  the  Form  of  a Covariate 

Fleming  and  Harrington  (1991)  showed  that  a smoothed  plot  of  the  martingale 
residuals,  computed  by  omitting  a covariate,  will  show  approximately  the  correct 
functional  form  for  that  covariate  when  plotted  against  the  omitted  covariate.  In 
Figures  4.3,  the  martingale  residuals  computed  by  omitting  Z are  plotted  against  the 
omitted  Z.  In  Figures  4.4,  the  martingale  residuals  computed  by  omitting  are 
plotted  against  the  omitted  X\.  The  scatter  plots  were  smoothed  by  using  LOWESS 
in  the  language  S.  Since  the  correct  model  is  fitted  in  Figures  4.3  (a)  and  (b),  the 
smoothed  martingale  residuals  should  show  approximate  linearity.  This  is  indeed  the 
case  in  both  plots.  In  Figures  4.3  (c)  and  (d),  there  is  an  approximate  quadratic  trend 
indicating  that  the  correct  form  is  Z = Zj  rather  than  Z — Z\.  Both  Figures  4.4 
(a)  and  (b)  do  not  show  linear  or  quadratic  pattern,  which  implies  that  the  incorrect 
form  of  Z = Zi  does  not  seem  to  influence  the  smoothed  martingale  residuals  plots 
against  X\. 

4.4  Discussion 


The  simulation  studies  of  martingale  and  score  residuals  show  that  both  types  of 
residuals  are  useful  for  detecting  an  inadequacy  of  an  assumed  model.  Although  not 
reported,  when  the  fitted  model  includes  an  incorrect  form  of  Xi,  it  is  not  very  clearly 
reflected  in  the  martingale  and  score  residual  plots.  Further  studies  are  needed  to 
investigate  other  aspects  of  the  model,  such  as  the  proportional  hazards  assumption 
using  martingale  or  score  residuals. 


(a)  Correct  Model:  First  Event  (b)  Correct  Model:  Second  Event 
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Figure  4.1.  Plots  of  martingale  residuals  against  the  ranks  of  the  observed  times 


(a)  Correct  Model:  Score_Z  vs.  Z (b)  Correct  Model:  Score_X1  vs.  X1 
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Figure  4.2.  Plots  of  score  residuals  against  covariates 


(a)  Correct  Model:  First  Event  (b)  Correct  Model;  Second  Event 
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Figure  4.3.  Plots  of  martingale  residuals  vs.  omitted  covariates.  LOWESS  smooths  use  a span  of  .2 


martingale  martingale 
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(a)  Correct  Model;  Second  Event 


(b)  Incorrect  Z:  Second  Event 


Figure  4.4.  Plots  of  martingale  residuals  vs.  omitted  covariates.  LOWESS  smooths 
use  a span  of  .2 


CHAPTER  5 

APPLICATION  TO  REAL  DATA 
5.1  Introduction 


In  this  chapter,  we  demonstrate  the  use  of  model  (2.3)  with  data  from  a Pediatric 
Oncology  Group  (Gainesville,  FL)  study  of  leukemia. 

The  study  was  conducted  to  compare  BMT  and  Chemotherapy  as  treatments  for 
childhood  acute  myelocytic  leukemia.  Eligible  patients  were  randomized  to  receive 
BMT  or  Chemotherapy  treatment.  We  will  use  the  data  on  a subset  of  111  patients 
who  were  randomized  to  BMT  to  (i)  assess  the  association  between  waiting  time  to 
BMT  and  patients  age  (<  2 years  and  > 2 years)  and  gender,  and  (ii)  the  effect  of 
the  waiting  time  to  BMT  on  a patient’s  survival  time. 

One  hundred  and  six  patients  were  followed  from  the  date  of  remission  to  the  date 
of  endpoints  of  interest  or  the  date  of  loss  to  followup.  The  numbers  of  patients  by 
age-group  (<  2 years  and  > 2 years)  and  gender  are  summarized  in  Table  5.1.  There 

Table  5.1.  Frequencies  of  age-group  by  gender 


< 2 years 

> 2 years 

male 

12 

42 

54 

female 

13 

39 

52 

25 

81 

106 

were  54  male  and  52  female  patients,  and  25  and  81  patients  in  the  age-group  of  < 
2 years  and  > 2 years,  respectively.  The  numbers  of  patients  experiencing  the  three 
types  of  events  - BMT,  Failure  (relapse  or  death)  before  BMT,  Failure  after  BMT 
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- are  listed  in  Table  5.2.  As  can  be  seen  from  Table  5.2,  there  were  30  informative 
censorings  and  41  random  censorings. 


Table  5.2.  Frequencies  of  Events 


Event 

Number  of  patients 

BMT 

71 

Failure  before  BMT 

30 

Randomly  censored  before  BMT 

5 

Failure  after  BMT 

35 

Randomly  censored  after  BMT 

36 

5.2  Model  and  Results 


The  Model 


The  data  (failure  times  in  weeks)  were  analyzed  using  the  model 


and 


(5.1) 


where 

_ J 0 if  age  < 2 years 

( 1 if  age  > 2 years, 

_ f 0 if  male 

I 1 if  female, 

and 

xi  = time  to  BMT. 


The  associated  counting  processes  and  their  compensators  are  as  in  Table  5.3. 
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Table  5.3.  The  Counting  Process  Model 


Event 

Counting  process 

Compensator 

BMT,  Failure  before  BMT 
Failure  after  BMT 

iVi(f) 

N2{t) 

According  to  Model  (5.1),  the  relative  risk  of  Failure  at  time  t of  a patient  whose 
waiting  time  for  BMT  is  Xi  relative  to  a patient  whose  waiting  time  is  x\  is 


Rf(xux\)  = 


^02  (^) 


(5.2) 


Similarly,  the  relative  risk  of  BMT  at  time  t for  a patient  with  covariates  z = {zi,Z2) 
and  who  is  eligible  for  BMT  at  time  t relative  to  a patient  with  covariates  z = (^1, 22) 
is 


Rbmt{z,  z ) 


(5.3) 


where  (3  = (/3i,/32).  Note  that  (5.3)  is  also  the  corresponding  relative  risk  of  Failure 
before  BMT.  Both  risks  (5.2)  and  (5.3)  can  be  estimated  using  MPLEs  of  a and  0. 
In  addition  to  estimating  the  relative  risks,  approximate  95%  confidence  limits  can 
be  computed  as 

^(xi-x[){a±l.96^yv{a)) 

for  Rf{xi,x\)  and 

g((^i-2))/3i+(^2-4)/52)±l-96\/v(z^) 

for  i?BMr(z,z'),  where  V{a)  and  V{z^)  are  the  estimated  variances  of  a and  (zi  — 
^'1)^1  + (^2  — -22)^2,  respectively. 
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Results 


The  model  (5.1)  in  counting  process  style  was  fitted  using  the  PHREG  procedure 
in  SAS,  version  6.12.  The  SAS  code  is  listed  in  appendix.  The  resulting  output  is 
shown  in  the  following. 


Testing  Global  Null  Hypothesis:  THETA=0 


Without 

Criterion  Covariates 


With 

Covariates 


Model  Chi-Square 


-2  LOG  L 

Score 

Wald 


1041.486  1038.457  3.029  with  3 DF  (p=0.3872) 

2.959  with  3 DF  (p=0.3980) 
2.954  with  3 DF  (p-0.3987) 


Analysis  of  Maximum  Likelihood  Estimates 


Parameter 

Standard 

Wald 

Pr  > 

Variable 

DF 

Estimate 

Error 

Chi-Square 

Chi-Square 

AGEGRP 

1 

0.017877 

0.21452 

0.00694 

0 . 9336 

GENDER 

1 

0.248193 

0.17660 

1.97517 

0.1599 

XI 

1 

-0.076462 

0.06727 

1.29178 

0.2557 

Estimated  Covariance  Matrix 

AGEGRP  GENDER 


XI 


AGEGRP  0.0460186988 
GENDER  0.0004353121 
XI  -.0015272737 


0.0004353121 

0.0311869383 

-.0012982215 


-.0015272737 

-.0012982215 

0.0045258603 


The  global  null  hypothesis  of  (/3,  a)  = 0 is  not  significant  (p-value  = .3872)  implying 
that,  on  the  whole,  the  covariates,  Zi,  Z2,  and  are  not  useful  for  predicting  failure 
rates.  However,  for  the  purpose  of  illustration,  we  will  continue  interpreting  the  result 
based  on  the  assumed  model.  The  negative  sign  of  a indicates  that  the  relative  risk  of 
Failure  after  BMT  increases  with  increased  waiting  time  for  BMT.  Using  a = —.076, 
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the  estimated  relative  risk  of  a patient  whose  waiting  time  for  BMT  is  xi  relative 
to  a patient  who  has  to  wait  A weeks  longer  is  e Figure  5.1  shows 

the  estimated  relative  risk  against  A.  As  expected,  longer  waiting  time  (larger  A) 
increases  the  risk  monotonically.  For  example,  the  estimated  relative  risk  of  Failure 


w 

'i. 

0) 

> 

o 


delta 


Figure  5.1.  The  estimated  relative  risk  of  a patient  whose  BMT  time  x\  relative  to 
longer  waiting  time  A. 

after  BMT  at  time  t of  a patient  whose  waiting  time  for  BMT  is  X\  relative  to  a 
patient  who  has  to  wait  for  one  month  longer  (A  = 4)  is  1.36  with  a 95%  confidence 
limit  of  (.80,  2.29).  Using  j3i  — .018  and  /?2  = .248,  we  obtain  the  estimated  relative 
risk  of  BMT  (Failure  before  BMT)  for  a patient  with  covariates  (2:1, 22)  relative  to  a 
male  patient  whose  age  < 2 years  (zi  — Z2  = 0)  as 
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Table  5.4  summarizes  the  estimated  relative  risks  for  different  combinations  of  values 
of  zi  and  Z2.  As  shown  in  Table  5.4,  the  relative  risks  of  BMT  (Failure  before  BMT) 
are  not  significantly  higher  relative  to  a male  patient  whose  age  < 2 years  since  all 
95%  confidence  limits  include  the  relative  risk  of  one. 


Table  5.4.  Relative  risks  and  the  associated  95%  confidence  limits  of  BMT  (Failure 
before  BMT)  relative  to  a male  patient  whose  age  < 2 years 


Zl 

22 

relative  risk 

95%  confidence  limit 

< 2 years 

Male 

0 

0 

1.00 

— 

< 2 years 

Female 

0 

1 

1.28 

(.91,  1.81) 

> 2 years 

Male 

1 

0 

1.02 

(.67,  1.55) 

> 2 years 

Female 

1 

1 

1.30 

(.76,  2.25) 

5.3  Checking  Residual  Plots 

In  this  section,  we  will  use  the  plots  of  martingale  and  score  residuals,  and  examine 
if  any  inadequacy  aspect  of  the  model  is  reflected  in  either  residual  plot.  Although 
none  of  the  covariates  appeared  to  be  significant,  fitting  the  time-dependent  prod- 
uct terms  of  age-group  x log(time)  and  Xi  x log(time)  revealed  that  age-group  x 
log(time)  and  Xi  x log(time)  are  significant  at  a = .10.  Therefore,  the  proportional 
hazards  assumption  may  not  hold.  We  will  examine  if  violation  of  this  assumption  is 
indicated  by  the  martingale  and  score  residual  plots. 

Figures  5.2  (a)  and  (b)  are  the  plots  of  martingale  residuals  from  the  real  data 
against  the  ranks  of  the  observed  times.  Percents  of  first  and  second  events  censored 
were  4.72%  and  50.70%,  respectively.  For  comparison.  Figures  5.2  (c)  and  (d)  are  the 
plots  of  simulated  martingale  residuals  against  the  ranks  of  the  observed  times  from 
the  similar  setting  of  the  real  data  where  percents  of  first  and  second  events  censored 
were  5.45%  and  45%,  respectively  based  on  110  subjects.  The  parameters  of  /3i,  (32, 
and  a were  chosen  to  be  .7.  Compared  to  Figure  5.2  (c).  Figure  5.2  (a)  appears  to 
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be  less  dispersed  in  later  times.  Compared  to  Figure  5.2  (d),  Figure  5.2  (b)  displays 
a pattern  that  about  half  of  the  observations  failed  in  early  times  and  with  the  other 
half  of  the  observations  censored  in  later  times.  Censored  observations  here  may  be 
corresponding  to  the  end  of  study  censorings.  This  pattern  does  not  seem  to  be  of 
concern. 

To  examine  the  appropriate  form  of  covariates,  Figures  5.3,  Figures  5.4  (a),  and 
(b)  are  displayed  to  show  the  smoothed  plots  of  martingale  residuals,  computed  by 
omitting  a covariate,  plotted  against  the  omitted  covariate.  To  examine  the  appro- 
priate form  of  age  treated  as  a continuous  variable.  Figures  5.3  (a)  and  (b)  show  the 
smoothed  martingale  residuals  computed  by  omitting  age,  plotted  against  the  age. 
Figure  5.3  (c)  is  the  corresponding  plot  for  X\.  In  these  plots,  there  are  no  clear 
detectable  patterns.  Figures  5.4  (a)  and  (b)  are  displayed  for  completeness,  although 
the  plots  are  not  very  useful  to  examine  the  appropriateness  of  indicator  covariate 
gender.  In  Figure  5.4  (c),  score  residuals  for  A'l  are  plotted  against  Xi.  The  plot 
against  age-group  and  gender  are  omitted  because  they  are  not  informative  due  to 
the  dichotomy  of  both  variables.  In  Figure  5.4  (c),  there  is  a pattern  of  fanning  out  in 
the  area  of  small  and  large  values  of  Ah,  which  may  be  an  indication  that  including 
covariate  Ah  in  the  assumed  model  may  not  be  adequate. 

The  overall  conclusion  is  that  there  is  no  particular  aspect  of  the  inadequacy  of  the 
fitted  model  shown  in  the  martingale  and  score  residual  plots.  In  particular,  violation 
of  proportional  hazards  assumption  does  not  seem  to  be  reflected  in  the  plots  we  have 
examined.  There  may  be  other  types  of  residuals  which  may  detect  violation  of  this 
assumption. 

We  conclude  this  chapter  by  noting  that  BMT  data  is  merely  one  example  of  the 
failure  time  data,  where  the  major  failure  variable  is  subject  to  informative  censoring. 
Our  method  can  be  applicable  in  a numerous  studies,  for  example,  cancer  studies. 
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where  time  to  cancer  is  subject  to  time  to  death  (informative  censoring),  as  well  as 
AIDS  and  cardiovascular  disease  studies. 


(a)  Real  Data:  First  Event  (b)  Real  Data:  Second  Event 
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Figure  5.2.  Plots  of  martingale  residuals  against  the  ranks  of  the  observed  times 


(a)  Real  Data:  First  Event  (b)  Real  Data:  Second  Event 
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Figure  5.3.  Plots  of  martingale  residuals  vs.  omitted  covariates.  LOWESS  smooths  use  a span  of  .2 


(a)  Real  Data:  First  Event  (b)  Real  Data:  Second  Event 
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Figure  5.4.  (a)  (b)  Plots  of  martingale  residuals  vs.  omitted  gender  (c)  Score  for  A^i  vs. 


CHAPTER  6 

SUMMARY  AND  FUTURE  RESEARCH 


In  this  dissertation,  we  have  described  how  a competing  risks  setup  can  be  used  to 
define  proportional  hazards  models  for  analyzing  informatively  censored  failure  time 
data.  A proportional  hazards  model  was  developed  for  informatively  censored  failure 
times  to  a primary  event  and  times  to  followup  events.  The  appropriate  partial 
likelihood  was  expressed  using  a counting  process  representation  which  provided  a 
convenient  structure  to  establish  the  uniqueness  and  the  asymptotic  normality  of 
MPLEs.  Simulation  studies  were  used  to  examine  the  effect  on  MPLEs  of  the  expected 
proportions  of  observed  outcomes,  the  appropriateness  of  a normal  approximation  to 
the  sampling  distributions  of  MPLEs,  and  the  performance  of  MPLEs  when  censoring 
is  considered  as  random  when,  in  fact,  it  is  informative.  The  results  indicated  a good 
normal  approximation  for  moderately  large  sample  sizes  and  a somewhat  poor  normal 
approximation  to  the  sampling  distribution  of  a for  small  sample  sizes.  Also,  bias, 
MSE,  and  the  mean  width  of  95  % confidence  interval  for  a became  large  when  the 
censoring  rates  increased,  and  the  performance  of  MPLE  of  P was  better  in  terms  of 
bias,  MSE,  and  the  mean  width  of  95  % confidence  interval  for  /3  when  the  informative 
censoring  was  accounted  for  in  the  model.  Martingale  and  score  residual  plots  were 
studied  to  demonstrate  their  usefulness  in  detecting  model  misspecification.  Finally, 
the  proposed  model  was  applied  to  a real  data  set  from  Pediatric  Oncology  Group, 
and  the  results  and  residual  plots  were  interpreted. 

Possible  topics  for  future  research  are 
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(i)  alternatives  to  h{xi)  — exp{xia)  to  specify  dependence  between  the  first  and 

second  failure  times, 

(ii)  models  for  hazard  functions  beyond  the  second  event, 

(iii)  methods  to  analytically  assess  model  misspecification, 

(iv)  test  of  informativeness  of  censoring  in  estimating  the  major  failure  time  param- 
eter(s). 

Modeling  beyond  the  second  event  may  involve  too  many  parameters  requiring 
some  restrictions  in  specifying  the  parameters  in  the  model.  Analytical  assessment 
of  model  misspecification  may  require  extensive  computation  to  obtain  the  observed 
significance  level  for  such  as  the  supremum  test  requiring  numerous  realizations  from 
the  null  distribution.  To  test  informativeness  of  censoring,  increase  of  information 
may  be  assessed  by  the  likelihood  ratio  test. 


APPENDIX  A 

SAS  CODE  FOR  APPLICATION  TO  REAL  DATA 
AT  SAS  Code 


Following  is  the  SAS  program  used  to  fit  the  model  in  Table  5.3  to  the  BMT  data, 
libname  c ’ ’ ; 

options  ps=60  ls=80  nocenter; 
proc  format; 

value  agefmt  0 = ’age  <=  2 years’ 

1 = ’ age  > 2 years ’ ; 
value  sexfmt  0 = ’male’ 

1 = ’female’ ; 

run; 

data  one; 

infile  ’bmtorig.dat’; 
input,  tl  t2  ftime  cat; 

tl:  time  to  first  failure 
t2:  time  to  second  failure 
ftime:  followup  time 

cat:  0=first  event  randomly  censored  1=BMT  2=relapse  or 

death! inf ormarive  censoring)  3=relapse  or  death  after  BMT 
4=second  event  ramdomly  censored 

run; 

data  bmt; 
set  one; 

CREATING  VARIABLES  ID,  VISIT,  TSTART,  TSTDP,  AND  STATUS. 

ID:  Patient’s  id 

VISIT:  Event  number  (1,  first  failure;  2,  second  failure) 

TSTART:  Time  of  the  first  failure  if  VISIT=2;  0 otherwise 
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TSTOP;  Time  of  the  kth  failure  if  VISIT=k 
STATUS:  Event  status  (1, failure;  0,  censored) 

retain  id  t start  cat  0; 
array  tt  tl-t2; 
id  + 1; 
cat  = 0; 
tstart  = 0; 
do  over  tt ; 
visit  = _i_; 
if  tt  = . then  do; 
tstop  = ftime; 
status  = 0; 

end; 

else  do; 
tstop=tt ; 
status=l ; 
end; 
output ; 
tstart =t stop ; 
end; 
run; 

data  bmt2; 
set  bmt ; 
a=0; 

type=l ; 

if  cat  in  (0,1,2)  then  do; 
if  cat  in  (1,2)  then  delta=status; 
else  delta=0; 
output ; 
end; 

type=2; 

if  cat  in  (3,4)  then  do; 

if  cat=3  then  delta=status ; 
else  delta=0; 
a=tl ; 
output ; 
end; 
run; 

proc  sort  data=bmt2; 
by  type  tstop; 
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run; 

proc  phreg  data=bmt2; 

model  (tstart,  tstop)*delta(0)  = agegrp  sex  a / 

covb  risklimits; 

where  tstart  < tstop; 

output  out=res_bmt  resmart=rmabmt  ressco=rscoage  rscosex  rscoa 
/order=data; 
id  id  visit  age; 
strata  type; 

title  'STRATIFIED  MULTIPLICATIVE  HAZARDS  MODEL  ’; 
title2  'WITH  COVARIATES  OF  AGEGRP  SEX  ALPHA’; 
run; 
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